Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: New tf.print #14

Merged
merged 5 commits into from
Feb 21, 2019
Merged

RFC: New tf.print #14

merged 5 commits into from
Feb 21, 2019

Conversation

tomerk
Copy link
Contributor

@tomerk tomerk commented Aug 24, 2018

Review period open until 2018-09-10

New tf.print

Status Proposed
Author(s) Tomer Kaftan (Google)
Sponsor Asim Shankar (Google)
Updated 2018-08-24

A proposal for a new tf.print operator, as part of TF 2.0 development.

Summary

This doc proposes a new tf.print TensorFlow printing approach that is very
similar to the standard python print API whether or not executing eagerly. It
also provides long-requested functionality for both eager and session-based
execution, such as a more meaningful Tensor summarization, support for printing
nested data structures that contain tensors, and controllable logging levels.

@googlebot

This comment has been minimized.

@googlebot
Copy link

CLAs look good, thanks!

@ewilderj ewilderj added RFC: Proposed RFC Design Document 2.0 TensorFlow 2.0 development labels Aug 27, 2018
@ewilderj ewilderj added this to Open reviews in RFC management Aug 27, 2018
output stream or logging level. The inputs may be dense or sparse Tensors,
primitive python objects, data structures that contain Tensors, and printable
python objects. Printed tensors will recursively show the first and last
`summarize` elements of each dimension.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way to print the whole elements of a tensor, say, summarize=None or -1 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can go ahead and add that functionality.

tensor = tf.range(10)
print_op = tf.print("tensors:", tensor, {2: tensor * 2},
output_stream=sys.stdout)
with tf.control_dependencies([print_op]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem much better than tf.Print is right now from a graph perspective. Don't the problems listed out in the overview continue to exist? Except now I don't just have an Identity, I have to control deps.

Copy link
Contributor Author

@tomerk tomerk Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a session-based graph building perspective those problems do continue to exist. My understanding is this is a lower priority to solve if we're moving to an eager & graph function-first world.

We could add support for tf.print as a context manager in session-based graph building mode that automatically adds the control dependency, so users could say:

with tf.print(...):
    ...
    out = ...
session.run(out)

Instead of having to explicitly add it with tf.control_dependencies.

Alternatively, it might be better to suggest that users add the return value of tf.print to their session.run code rather than add control_dependencies, if that's a better user experience. That would be harder w/ control flow and functions in graph mode though.

We could even try implicitly adding tf.prints to a graph collection that automatically gets executed at session.run. I'm more hesitant to do that, because it would require a lot of care with control flow & functions, and to work nicely with graph pruning.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, I agree with @karmel and feel that the new proposal is a worse experience for session-based graph users. If solving this problem is a low priority for the team, please reconsider deprecating tf.Print until a better solution is available.

print_op = tf.print("tensors:", tensor, {2: tensor * 2},
output_stream=sys.stdout)
with tf.control_dependencies([print_op]):
doubled_tensor = tensor * 2
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it's potentially slightly confusing to use tensor * 2 as the tensor being printed and here, since it becomes less clear which doubled tensor is being printed below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah that's true, I can update this example.


Note: This op is only partially compatible with Jupyter notebooks and colabs.
Because it prints to the C++ standard out / standard error, this will go
in the notebook kernel's console output, not in the notebook cell output.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a fairly big caveat. Isn't one of the primary use-cases running in a notebook or similar, when you're debugging some graph/op?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I think so too. There's unfortunately no clean solution because python notebooks don't support this very well. As I mention lower down in the doc:

*  Python notebooks require complex capturing logic to send C++ output to notebook cells, and
    the solutions are often not totally portable across operating systems /
    C++ runtimes.
*  Mixing C++ output and python print output requires regular, careful flushing to prevent things happening in strange orders. This definitely has portability issues.
*   If we want to do something about this, a possible approach could be to
    have the PrintV2 kernel execute a python print if we detect a
    jupyter/colab notebook environment and that the current device has a
    python runtime, rather than using the C++ logging/printing methods.
*   Alternatively, we could provide utilities to capture the C++
    stdout/stderr in Jupyter notebook outputs as a part of TensorFlow.
*   With either approach, we would have to be very careful w/ device placement in distributed
    multi-machine/multi-device settings to ensure that the print device is
    the notebook kernel CPU.

There is some internal tooling for colabs that captures stderr and prints it to the cell output which we could look into releasing. But, it's not perfect and it only flushes/prints the log output once the cell finishes executing. This can be painful w/ long-running execution so we'd need to add a regular poll for flushing.

printed output will be separated by spaces. Inputs may be python
primitives, tensors, data structures such as dicts and lists that
may contain tensors (with the data structures possibly nested in
arbitrary ways), and printable python objects.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this include data structures with mixed types? Ie, [python int, tensor, python object, python list of tensors...]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those print fine as well. I can add some text about that to the description.


output_stream: The output stream or logging level to print to. Defaults to
sys.stderr, but sys.stdout, tf.logging.info, tf.logging.warning,
tf.logging.error, and tf.logging.fatal are also supported.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of colabs + similar, where do tf.log messages end up going?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tf.log messages should end up going to stderr, with extra logging context & the log filtering levels applied. If the device is set to the colab kernel, this should end up in the kernel console stderr logs. (But I can check this to confirm).


````python
@tf_export("strings.format")
def string_format(template, inputs, placeholder="%s", summarize=3, name=None):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason the default placeholder shouldn't reflect "new style" string formatting?

Copy link
Contributor Author

@tomerk tomerk Aug 30, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure which formatting style on that page you're referring to. If you just mean changing the default placeholder to "{}", that's totally fine and easy to do.

If you mean adding support for positional arguments, for modifiers of the number format, or even for the full-fledged python formatted string literals (https://docs.python.org/3/reference/lexical_analysis.html#f-strings) there's more steps involved. I'm hesitant to spend too much effort on string formatting as part of this design, because the focus of this is on the printing functionality. At the same time, we don't want the API of this design to limit adding better functionality to the strings.format operator in the future.

Below I list a bunch of possibilities for how string formatting could be made more fully-fledged (although I'm not sure if any of these are worthwhile right now):

A lot of complexity comes about from having to combine both python string processing and c++ string processing. We need to process python code to do the python object formatting & template substitution, and then have to use c++ code to process the template with python objects inserted, format tensors, and substitute them into the template.

If you are referring to support for {} and positional arguments: We could carefully combine the nest utilities & python formatting to convert the string into a template that either the c++ abseil strings substitute method or printf in can support.

  • The abseil template would only be able to support 10 tensors positionally referenced in the template.
  • Printf would be able to support NL_ARGMAX positionally specified tensors (which is implementation-dependent, but is at least 9).
  • These would probably be slower than the current strsplit/strcat-based implementation of the kernel, but the absl method should be faster than the printf based one.
  • Printf might be more susceptible to security flaws.
  • We would have to be careful if any python objects inserted into the template include strings that conflict w/ the printf or absl substitute placeholders. It would probably be easier to escape / modify the printed objects correctly for absl substitute than for printf.

If you mean printf named argument support where tensors in the format string are requested as strings (but python objects may be formatted w/ the number formatting): Same as the above support for {} and positional arguments.

If you mean printf style number formatting for tensors: We would no longer be able to call directly into strsplit/substitute/printf methods. We would need to manually check the string to find where tensors need to be inserted and how their elements should be formatted. We would then need to extend the tensor summarization code w/ support for printing elements using different number formats.

If you mean fully-fledged python f-strings: This would be really hard and would involve manually checking the python environment scopes / manually parsing & rewriting the python code to hook references to tensors.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To clarify, all I meant was having {} be the default, since that also plays nicely with ints, floats, and others by default, whereas %s does not. Plus, one assumes that people will gradually expect that to be the way they write string formatters as py3 takes over.

inputs: A list of `Tensor` objects, or a single Tensor.
The list of tensors to format into the template string. If a solitary
tensor is passed in, the input tensor will automatically be wrapped as a
list.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this also support python primitives/objects as the print method does? Imagine:

def my_func(tensor, some_param=5)
  raise ValueError(tf.string.format("You passed in {} with param {}, which is not allowed", (tensor, my_param))

Seems easier than having to tf.string.format the tensor, then normal-string-format that with the param.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my above comment on supporting {} and positional arguments. This is definitely possible, but there would be some caveats caused by using absl string substitute or the c++ printf.

printing not work. In colab they might switch to tf.print then suddenly be
confused why nothing is printing.
* It may make sense for eager graph functions to automatically hook the
python print method to replace it with calls to `tf.Print`. This would
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean tf.print here presumably?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will update that!

* This also means that the structure of printed lists/dicts is also only
captured the first time a `tf.print` is executed. If tensors are added or
removed to a list printed in `tf.print` multiple times, those changes to the
list will not be captured when printing again!
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it will be a confusing gotcha for users. Is there some way to indicate that the value is stale?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just ran this by @akshayka, and this actually shouldn't be an issue.
Non-eager tf control flow doesn't support modifying python objects anyway, and the tfe.defun graph functions should automatically detect that the cached value is stale and will re-generate the ops.

I'll remove this caveat from the doc.

@karmel
Copy link

karmel commented Sep 13, 2018

Design Review notes

  • Most comments were addressed in the existing CLs
  • Concerns that in graph mode, tf.print is still painful
    • Could use context -- with tf.print-- but that's not better.
    • Any way to get automatic control deps? defun.
    • This concern mostly goes away in 2.0, so fine to leave as-is for now.
    • → Include defun example, test
  • For the registration of the op, no special formatting for now-- just a placeholder attribute. Seems fine for now.
  • There is an explicit with device(cpu); is this a problem?
    • joshl: Seems like it will fix more than it breaks.
    • apassos: not safe for non-eager TPU code.
    • There are places this would help-- GPU code block in eager.
    • Then user should specify with cpu
    • Eager should respect soft placement, but that's a separate issue
    • → Remove explicit device placement
  • Concerns about memory/performance if you are copying tensor to print onto CPU?
    • Turn off printing for performance; that is expected
    • But memory? Shouldn't be a concern generally. If the tensor fits on the GPU, it will fit on the CPU it is being transferred to.
    • Don't worry for now. If we need to add summary ops later, we can
  • Add a deprecation warning to tf.Print

@mitar
Copy link

mitar commented Sep 16, 2018

I do not think this is a good design, if we want compatibility with Python logging. Currently, the caller of the tf.print decides what is an output stream. This is I think not good. Ideally, using tf.print you would just send data to be printed/logged. And then from a central location you could configure exactly where and how and which levels should be logged. It should not be a role of the tf.print to decide if something should go to stderr or logging stream. It should maybe just set a level of the message.

In this way, in libraries, we can have clear and standard way to log messages. And then users of libraries they can configure in which way they want to display those message (iPython output, stderr, stdout, redirect to Python logging, saving to database, whatever).

@ewilderj ewilderj moved this from Open reviews to Awaiting Committee in RFC management Sep 18, 2018
@ewilderj
Copy link
Contributor

@tomerk are there notes from the review meeting to add to this PR comment thread? and are there any updates?

pending adding those, I'm ready to merge this.

@tomerk
Copy link
Contributor Author

tomerk commented Sep 19, 2018

Hi @ewilderj, @karmel added the notes above. Before merging I still need to figure out if it makes sense to do anything re @mitar's comment first, or if that should come later (e.g. having tf.print default to None as the logging level, in which case it would look it up from a global config.)

@ewilderj
Copy link
Contributor

Great, thanks. Sorry I missed the notes. Please feel free to ping me when further action is needed.

@ewilderj ewilderj moved this from Awaiting Committee Notes to In Revision in RFC management Sep 19, 2018
@tomerk
Copy link
Contributor Author

tomerk commented Sep 19, 2018

Okay, so:
@mitar Thank you for your suggestion! To do it we would need to set up some sort of global configuration for Tensorflow logging & automated print op device placement (e.g. see the concerns mentioned in the doc re: printing in colabs). We've decided that this is out of scope for this specific design doc. However, we can always look into extending tf.print's default behavior in the future to support this.
@ewilderj you're free to merge this now, thank you!

@mitar
Copy link

mitar commented Sep 19, 2018

However, we can always look into extending tf.print's default behavior in the future to support this.

How do you see this being possible, if you we now train everyone to hard-code where to output logs. Then later on we will have a problem or ignoring what they explicitly requested (output to stdout) or not being able to suppress it.

I really believe that providing to a community tooling here is critical because otherwise it will be so much noise that this logging will become useless. Especially in code where TF is used deep inside some other codebase.

Also, why not simply building this on top of Python logging. Then TF does not have to reinvent anything here. Just use Python logging and TF can be just a Python logger.

tensorflow-copybara pushed a commit to tensorflow/tensorflow that referenced this pull request Sep 19, 2018
…the standard python `print` method, and deprecates the old `tf.Print` operator (to be removed in in v2.0).

It follows the design doc specified in tensorflow/community#14 and additionally incorporates the community feedback and design review decisions.

This CL adds two new internal graph operators: a StringFormat operator that formats a template string with a list of input tensors to insert into the string and outputs a string scalar containing the result, and a PrintV2 operator that prints a string scalar to a specified output stream or logging level.

The formatting op is exposed at `tf.strings.Format`. A new python method is exposed at `tf.print` that takes a list of inputs that may be nested structures and may contain tensors, formats them nicely using the formatting op, and returns a PrintV2 operator that prints them. In Eager mode and inside defuns this PrintV2 operator will automatically be executed, but in graph mode it will need to be either added to `sess.run`, or used as a control dependency for other operators being executed.

As compared to the previous print function, the new print function:
- Has an API that more closely aligns with the standard python3 print
- Supports changing the print logging level/output stream
- allows printing arbitrary (optionally nested) data structures as opposed to just flat lists of tensors
- support printing sparse tensors
- changes printed tensor format to show more meaningful summary (recursively print the first and last elements of each tensor dimension, instead of just the first few elements of the tensor irregardless of dimension).

PiperOrigin-RevId: 213709924
@tomerk
Copy link
Contributor Author

tomerk commented Sep 19, 2018

Hi @mitar ,
W/ regards to not training users improperly now: I'll run it by some folks again to see about making the output_stream kwarg default to None instead of stderr, and perhaps adding a default_print_output_stream to the context (which itself would default to stderr).

As for your second question, as well as why adding python-based printing & custom logging functionality tf.print would be out of scope for this specific document:
Building tf.print on top of python print itself (using tf.py_func) was my first inclination as well.
Unfortunately we can't naively build tf.print like this because that places an operator in the graph whose device requires a python runtime. This would cause issues when running in distributed settings, on TPU devices, on embedded devices, and when using non-python front-ends.

So, properly adding support for this would require configuration options to determine if there is a python runtime, and what device is the client. The print / string_format operator placement then needs to carefully happen according to these configurations. At the same time, we would still need to support exporting the graph to environments without python runtimes, so we would have to be careful to make sure this doesn't cause device placement issues for the exported graphs.

Additionally, the operator kernel itself would need to know whether to switch between a py_func for printing and the C++ logging operations. We would probably still need to maintain some level of consistency between the functionality of both.

This is somewhat related both to the "Device Locations" section in the document, and the various caveats mentioned w/ respect to Colabs.

@mitar
Copy link

mitar commented Sep 19, 2018

I'll run it by some folks again to see about making the output_stream kwarg default to None instead of stderr, and perhaps adding a default_print_output_stream to the context (which itself would default to stderr).

I think this would be better. Or maybe even not having output_stream in the print, but just have it configured through the context?

@ewilderj
Copy link
Contributor

@tomerk I'll wait a re-confirmation, if you're making changes as per the recent comments.

ganny26 added a commit to ganny26/tensorflow that referenced this pull request Sep 21, 2018
* Add --config=v2 option to the .bazelrc file.

PiperOrigin-RevId: 213027176

* Populate custom name in registration.

PiperOrigin-RevId: 213028338

* Disable the flaky test case in timeline_test

PiperOrigin-RevId: 213034078

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213037039

* Disable flaky gpu_base_test

PiperOrigin-RevId: 213040362

* Added TFE_OpSetAttrTensor() to eager C API.

Also added some experimental C APIs for facilitate the use of eager C APIs in
S4TF compiler.

PiperOrigin-RevId: 213041780

* Generalize TransformFilter method in preparation of NHWC Conv support

PiperOrigin-RevId: 213049674

* [TF:XLA] Remove special base case from BatchDot that has been redundant ever since xla::DotGeneral was added.

PiperOrigin-RevId: 213052269

* Disable flaky keras_test.

PiperOrigin-RevId: 213053512

* Refactored some of the metrics code in compile function for better readability.
- Logic change: Moved getting metric name and function out of the training/eval loops in eager mode
- Moved setting metric attributes on the model out the function which calls metric functions.

PiperOrigin-RevId: 213060143

* Fixed documentation of Optimizer.minimize() for eager mode to match behavior of Optimizer.compute_gradients().

PiperOrigin-RevId: 213060585

* Fix spelling in error message

PiperOrigin-RevId: 213062112

* Makes tf.Variable arguments (non-captured) DT_RESOURCE function inputs.

Previously, tf.Variable arguments to a defun-d Python function were made captured inputs. This change makes it possible to parameterize functions on DT_RESOURCE inputs.

PiperOrigin-RevId: 213064739

* Switch to Eigen::Index in Tensorflow kernels.

Mixing index type doesn't work well with latest Eigen.

PiperOrigin-RevId: 213067224

* Revert PR tensorflow#21997: Fixes the formatting issue pointed out at tensorflow#21762

It breaks. should be s/input_shape/inputs_shape.

PiperOrigin-RevId: 213070141

* Make accessed variable ordering deterministic again when constructing defuns

PiperOrigin-RevId: 213074939

* fix bug of lacking axis when using array.ops.concat in unwrap_and_concat function

* compat: Update forward compatibility horizon to 2018-09-15

PiperOrigin-RevId: 213100589

* [TPU] Deprecate the computation_shape attribute to the TpuReplicate op in lieu of a new num_cores_per_replica.

PiperOrigin-RevId: 213111326

* compat: Update forward compatibility horizon to 2018-09-16

PiperOrigin-RevId: 213161736

* Introduce gmock matchers for TensorFlow nodes

I need these to write readable unit tests for TF graph transformations.  All of
my use cases will live inside tensorflow/compiler so putting it in
tensorflow/compiler/jit for now; but we can move these out if other users are
interested.

In the future we may want to auto-generate type safe versions of these from the
op registrations like we generate C++ wrappers today.

PiperOrigin-RevId: 213186810

* Conditionally allow changing a non-fusion computation root_instruction shape.

PiperOrigin-RevId: 213191899

* Update broken link to intro on ADAGRAD

* Fix some typos in the doc for XlaDynamicSlice

phawkins@ suggested these in cr/212715067 but I accidentally made the changes in
another client.

PiperOrigin-RevId: 213208811

* Improve TFLite iOS doc.

PiperOrigin-RevId: 213210253

* Add ZerosLike to schema.

PiperOrigin-RevId: 213212445

* Implement ZerosLike

PiperOrigin-RevId: 213227615

* Add fill to schema.

PiperOrigin-RevId: 213234759

* compat: Update forward compatibility horizon to 2018-09-17

PiperOrigin-RevId: 213234942

* revised a parameter error

Hi, i found that when firstly use `interpreter `as a parameter pass into `eval_model` function, wrong spell mistake of `interpreter_quant`.

* [XLA:TF] Enable int8 and uint8 support in the bridge for CPU/GPU

The test changes are awkward. None of these are XLA bugs, it's just that the op
definitions in tensorflow are really inconsistent. I tried to infer whether the
limitation is on signed types, index types or just arbitrary. In the latter
case just int8/uint8 is blacklisted, we should probably lift that requirement
at some point.

PiperOrigin-RevId: 213243906

* README s/tensorflow.contrib/tensorflow.python/.

PiperOrigin-RevId: 213262445

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213275003

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213281730

* Removing unused code comment in AutoGraph error rewriting.

PiperOrigin-RevId: 213282302

* [tf.data] Adding support for `tf.data.AUTOTUNE` as a special value for the `num_parallel_calls` argument of `tf.data.Dataset.map()`, `tf.data.Dataset.interleave()`, and `tf.contrib.data.map_and_batch()`.

When `tf.data.AUTOTUNE` is specified, the level of parallelism is determined at runtime. The underlying mechanism instruments the input pipeline to build a performance model and then uses the model to find the optimal values for the parallelism knobs.

PiperOrigin-RevId: 213283297

* Increase tolerance in linalg_grad_test to fix tensorflow#19935

Fixes tensorflow#19935

PiperOrigin-RevId: 213286535

* Minor docstring change: update link to saved_model_cli.

PiperOrigin-RevId: 213296537

* [Java]: Release 1.11.0-rc0

PiperOrigin-RevId: 213305616

* Fix and complete StreamExecutor's DoFusedConvolve:
* bias_nd is set to have CUDNN_DATA_FLOAT, even though BiasType is not float.
* double is supported but not exposed through the public interface.
* DoFusedConvolveImpl has duplicated information in its template parameter list.

PiperOrigin-RevId: 213308435

* Numerics tweak to symmetric quantization.

PiperOrigin-RevId: 213314024

* Do not segfault in Conv2d/3d if cuDNN version is too low.

PiperOrigin-RevId: 213315830

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213316034

* [XLA] Allow adding extra instructions in HloComputation::CloneWithReplacements

PiperOrigin-RevId: 213316504

* GradientTape: Documentation formatting tweak.

PiperOrigin-RevId: 213318051

* [XLA] Add ReduceWindow test.

PiperOrigin-RevId: 213322116

* Raise error on encountering bad indentation during Autograph parsing.

PiperOrigin-RevId: 213324570

* Move from deprecated self.test_session() to self.cached_session().

self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about:
* the fact that the session may be reused.
* the session is not closed even when doing a "with self.test_session()" statement.

PiperOrigin-RevId: 213326167

* Add missing `watch` call to GradientTape documentation.

PiperOrigin-RevId: 213326503

* Move from deprecated self.test_session() to self.cached_session().

self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about:
* the fact that the session may be reused.
* the session is not closed even when doing a "with self.test_session()" statement.

PiperOrigin-RevId: 213326581

* Add support for predicting models with learning_phase.

PiperOrigin-RevId: 213327633

* Compute `axes` and `free` statically during graph creation.

PiperOrigin-RevId: 213327709

* Tweak test tolerance in segment_reduction_ops_test.py, which is otherwise flaky.

PiperOrigin-RevId: 213327863

* Improve the error messages in custom_export_strategy.

PiperOrigin-RevId: 213334465

* Use a single thread in eager if inter_op_parallelism_threads isn't specified.

PiperOrigin-RevId: 213336463

* Keep only weak references to variables in graph functions

This enables cleanup of the variables referenced in defunned methods of objects when the object is garbage collected. Since one PolymorphicFunction is created per @Defun, decorated methods before this change held on to all of the variables referenced in that method for any instance of the class (i.e. variables which should have been object-scoped were scoped to the lifetime of the class definition).

Raises an exception if variables used in the function have been deleted when it is called, which means no local variables.

PiperOrigin-RevId: 213337256

* Fix testing bug where partitioned primals wasn't actually being tested (constructing Variable directly instead of get_variable under scope with partitioner).

PiperOrigin-RevId: 213345447

* Add benchmarks comparing Mkl vs Default Conv2D ops.

PiperOrigin-RevId: 213346439

* Fix _check_is_tensor like _check_is_tensor_or_operation was fixed in tensorflow#22264.

PiperOrigin-RevId: 213346485

* Add api_docs_relpath option.
Eliminate error when copying a file to itself.

PiperOrigin-RevId: 213349424

* Move OvicBenchmarker class from app folder to source folder.

PiperOrigin-RevId: 213349833

* Add generic fallback optimized implementations for dilated DepthwiseConv.

PiperOrigin-RevId: 213350122

* Remove tensorflow/contrib/linalg library.  linalg remains in core.

PiperOrigin-RevId: 213352573

* Fix GraphConstructor and import_graph_def bug with variadic ops.

Prior to this change,
GraphConstructor::PopulateMissingUnusedInputMapKey() didn't correctly
compute the number of outputs for ops with variadic outputs. This
meant that missing_unused_input_map_keys could contain spurious
entries for unused variadic outputs, which could trigger a ValueError
in import_graph_def.

This also adds a new util method in node_def_util.h, NumOutputsForNode().

PiperOrigin-RevId: 213353158

* Fixing the documentation of the parse_sequence_example function.

PiperOrigin-RevId: 213354240

* [tf.data] Introducing `tf.data.Dataset.window(size, shift, stride, drop_remainder)`, which can be used for combining elements of input dataset into "windows". A window
is itself a finite dataset and, among other things, can be used for generalized batching (see tensorflow/community#5 for details).

PiperOrigin-RevId: 213360134

* Add basic op resolver registration to TFLite C API

PiperOrigin-RevId: 213360279

* Update 1.11.0-rc0 version strings to 1.11.0-rc1 (tensorflow#22284)

* Make HLO liveness analysis correctly handle computations with side effect instructions.

PiperOrigin-RevId: 213361904

* Changing `OpInputList` so that it is a forward iterator and taking advantage of the fact in the tf.data kernels.

PiperOrigin-RevId: 213361953

* Increase test timeout for dnn_tree_combined_estimator_test to de-flake.

PiperOrigin-RevId: 213363558

* Fixed bug where a mixture of Variable and PartitionedVariable would break SDCA.  Added new test that fails with `IndexError: list index out of range` in `_get_partitioned_update_ops` without the corresponding fix.

Note that the effect of this bug is minimal, because for Estimator users, it only applies to sparse features that are not partitionable (e.g. [1,]), since all variables are created with the same partitioner in Estimator).

PiperOrigin-RevId: 213365956

* Remove unnecessary side-effect test, since HLO liveness now reports correct
liveness information if a control flow computation contains side effect
instructions.

PiperOrigin-RevId: 213367995

* Update ops-related pbtxt files.

PiperOrigin-RevId: 213368723

* Eliminate VisitableAllocator.

The visitor pattern is used to allow pre-registration of memory for
DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking.  The
VisitableAllocator interface was introduced to support this use some
time ago, prior to SubAllocators. Memory registration works best if
it's done infrequently, on large pieces of memory, rather than on
every piece that's dynamically allocated/freed.  This usage pattern
fits the SubAllocator better than a general Allocator.  This change
moves memory allocation visitor access to SubAllocator and eliminates
the VisitableAllocator subclass of Allocator.

This change also more rigorously enforces the requirement that all
Visitors be declared prior to memory allocation begining.  This is
accomplished by requiring that Visitors be provided to the SubAllocator
constructor.

This refactoring will ease an upcoming CL introducing
NUMA specific CPU devices.  It also should fix some performance
pitfalls (e.g. accidental use of PoolAllocator) introduced by an
earlier refactoring of ProcessState that was also in preparation for
NUMA.  It restores the default use of the cpu_allocator() value (i.e.
no SubAllocator) by model executions that don't use allocation
visitors (since visitor registration must precede the first allocation,
hence can be detected at that time).

PiperOrigin-RevId: 213371553

* Add type checking at the beginning of tpu.shard().

Otherwise a message like "TypeError: Tensor objects are only iterable when eager execution is enabled. To iterate over this tensor use tf.map_fn." will be thrown, which is confusing.

PiperOrigin-RevId: 213371676

* Remove some dead code after migration from python to C.

PiperOrigin-RevId: 213372027

* Increase test timeout for image_grad_test to de-flake.

PiperOrigin-RevId: 213372241

* Num elements fastpath for eager tensors.

PiperOrigin-RevId: 213377426

* Break cwise_opt_test.py into 3 files to speed up testing, since we are up against the 50 shard limit.

PiperOrigin-RevId: 213377776

* Add Keras TPU support for the new metrics.

PiperOrigin-RevId: 213378552

* Register fp16 reduce_max on GPU

PiperOrigin-RevId: 213383647

* Fix unused variable error on powerpc.

PiperOrigin-RevId: 213386145

* [tf.data] Fixing an error in the optimization loop.

PiperOrigin-RevId: 213386401

* Refactor out the metadata_ops set from const_analysis to a per-op bit; NFC

PiperOrigin-RevId: 213389224

* Automated rollback of commit 185aa89

PiperOrigin-RevId: 213394522

* Support scoped_allocator_ops for renamed device.

This fixes tensorflow#22274.

Signed-off-by: Bairen Yi <byi@connect.ust.hk>

* [XLA] Refactor conv_ops emitters to make them reusable.

PiperOrigin-RevId: 213398930

* compat: Update forward compatibility horizon to 2018-09-18

PiperOrigin-RevId: 213414462

* Simplify the interface of conversion_call to allow a ConversionOptions object that can be more easily extended. Currently any new argument needs changing a lot of call sites and there is redundancy in argument documentation.

Note: this does not modify the public symbols yet - it's not clear whether we want to complicate their interface. However we may want to use it in to_graph and to_code.
PiperOrigin-RevId: 213433379

* Add a fuzzer to test DecodeCompressed

PiperOrigin-RevId: 213441868

* Automated rollback of commit 19d66a9

PiperOrigin-RevId: 213453719

* Creating an InstantiatedCapturedFunction that captures the instantiated state of a function to be executed, separating it out from the non instantiated regular state such as function name, captured inputs etc.

This allows us to truly separate Dataset kernel creation from Iterator creation i.e. each time a dataset is created that uses functions, we create only a CapturedFunction whereas we create an InstantiatedCapturedFunction each time a new iterator is created.

PiperOrigin-RevId: 213456128

* Extend template expansion support for arithmetic expressions.

PiperOrigin-RevId: 213462334

* [SE] Restore int8x4 data types if that's the requested DataLayout for fused conv

This broke in a recent refactoring.

PiperOrigin-RevId: 213497416

* Link to readme for distribution strategy from distribute.py and package init file, so that folks looking at API documentation can find the readme as well.

PiperOrigin-RevId: 213499832

* Only start_step/end_step on GradientTape if executing eagerly.

This prevents creating a context where none is required.

PiperOrigin-RevId: 213500408

* Register FakeResourceUpdateOp for the right op

Before this CL the PartiallyDeclusterPassTest.DontDuplicateResourceVarOps test
was buggy, in that it wasn't testing what it was supposed to test.

PiperOrigin-RevId: 213501558

* Eliminate VisitableAllocator.

The visitor pattern is used to allow pre-registration of memory for
DMA access, e.g. for fast GPU/CPU i/o and for RDMA networking.  The
VisitableAllocator interface was introduced to support this use some
time ago, prior to SubAllocators. Memory registration works best if
it's done infrequently, on large pieces of memory, rather than on
every piece that's dynamically allocated/freed.  This usage pattern
fits the SubAllocator better than a general Allocator.  This change
moves memory allocation visitor access to SubAllocator and eliminates
the VisitableAllocator subclass of Allocator.

This change also more rigorously enforces the requirement that all
Visitors be declared prior to memory allocation begining.  This is
accomplished by requiring that Visitors be provided to the SubAllocator
constructor.

This refactoring will ease an upcoming CL introducing
NUMA specific CPU devices.  It also should fix some performance
pitfalls (e.g. accidental use of PoolAllocator) introduced by an
earlier refactoring of ProcessState that was also in preparation for
NUMA.  It restores the default use of the cpu_allocator() value (i.e.
no SubAllocator) by model executions that don't use allocation
visitors (since visitor registration must precede the first allocation,
hence can be detected at that time).

PiperOrigin-RevId: 213505655

* Clean up remove_negation pass in Grappler.

PiperOrigin-RevId: 213520177

* Add error reporting TFLIte C API

PiperOrigin-RevId: 213526489

* [TF:XLA] Document that the order of control predecessors matters.

PiperOrigin-RevId: 213528296

* Automated rollback of commit b1ff7c2

PiperOrigin-RevId: 213528716

* Updates documentation of Estimator.predict to note that an issue with yielding and graph context.

PiperOrigin-RevId: 213528782

* "Isolate" must-be-constant side effecting operations

I first tried to fix this issue in cr/209996730 but didn't quite fix the problem
for for XLA_* devices.  A node assigned to an XLA_* device must be compiled so
the cr/209996730 fix of simply not compiling the nodes doesn't generalize to
XLA_* devices.  Instead we now "isolate" these nodes, only putting them in a
trivial one-node cluster.  For non-XLA devices even this trivial cluster is
ignored because of flags->tf_xla_min_cluster_size.

I was initially considering a more principled data-flow-analysis based solution
but then decided the upfront work isn't worth it until I see a clear motivating
example.

PiperOrigin-RevId: 213531437

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213536334

* Reject RESHAPE if new_shape tensor is not provided.

PiperOrigin-RevId: 213541006

* Return OrderedDict as eval results should be sorted by global_step key.

PiperOrigin-RevId: 213541935

* Add ConstantScalar, WithPredicate, Disjunction, and OpAnyOrder (where Op
is a commutative binary operator) to the XLA pattern matcher.

PiperOrigin-RevId: 213543953

* Convert the new metric instances to (value_op, update_op) tuple in the EstimatorSpec.

PiperOrigin-RevId: 213548081

* Add a new function to load kernel libraries and library folders.

PiperOrigin-RevId: 213549838

* Add layout information to logging.

PiperOrigin-RevId: 213551652

* Go: Update generated wrapper functions for TensorFlow ops.
PiperOrigin-RevId: 213552354

* Update the grappler plugin to support the @Defun generated function and ops.

PiperOrigin-RevId: 213554813

* [tf.data] Add a test for state persistence between iterators over the same MapDataset.

PiperOrigin-RevId: 213555982

* Getting DNNModel to work with the new feature columns.

PiperOrigin-RevId: 213561495

* First commit for functional while loop.
Supports single and double derivatives but does not supporting nesting yet.

tensorflow/community#13

PiperOrigin-RevId: 213565971

* Putting `NodeExecStatsWrapper` behind an interface and providing a light-weight statistics collector for tf.data performance modeling.

PiperOrigin-RevId: 213566889

* [TF:XLA] Change HloPtrComparator to work across HLO modules. Declaring the method out of line does not increase compile time.

PiperOrigin-RevId: 213571783

* Add xla.compile(), a low-level API that compiles graph with XLA.

PiperOrigin-RevId: 213574904

* Modify Timeline Analysis to consider allocations in order.

PiperOrigin-RevId: 213589710

* Implement sort op for CPU.

Also don't allow parallelization for the sort op in parallel_task_assignment.

PiperOrigin-RevId: 213592046

* Replace DLOG(FATAL) with an Unimplemented error.

In tensorflow we don't have DLOG, and we should not use LOG(FATAL).

PiperOrigin-RevId: 213595376

* Enable XlaSort and TopKV2 for CPU backend.

PiperOrigin-RevId: 213595499

* compat: Update forward compatibility horizon to 2018-09-19

PiperOrigin-RevId: 213595705

* Run CPU tests remotely.

Being able to run CPU tests remotely while running GPU tests locally required
multiple changes:
1. Unify how we tag GPU tests in TF; we now always use tf_cuda_tests_tags().
2. Tag tests using tf_cuda_tests_tags() with 'local' and 'gpu'; this makes
   them not run on non-gpu builds and always runs them locally.

PiperOrigin-RevId: 213601626

* jacobian: manually setting the output shape in the output.

PiperOrigin-RevId: 213610324

* Enable tests for CPU and GPU backends that involve XlaSort.

PiperOrigin-RevId: 213611371

* [TF:XLA] Enable ClipByValue test for integer types

This has been fixed a while ago. Even though TF allows ClipByValue for complex
types it's not implemented anywhere (and it doesn't make sense for complex
numbers) so blacklist complex types.

PiperOrigin-RevId: 213615429

* Distributions should raise the original exception (log_prob not implemented) instead of the fallback exception (prob not implemented).

Additionally, in a nested structure of transformed distributions, it can be useful to know which distribution is raising this error.

PiperOrigin-RevId: 213618306

* Enable while_test for the GPU backend.

PiperOrigin-RevId: 213618350

* Add interface for HLO passes which run on HloModuleGroup.
Derive HloModulePass and HloModuleGroupPass from HloPassInterface which run module-scoped and module-group-scoped respectively. Replace all existing uses of HloPassInterface with HloModulePass because all existing passes are module-scoped. Also rewrite HloPassPipeline to support both module-scoped and module-group-scoped passes.

PiperOrigin-RevId: 213629604

* Automated rollback of commit 9fe1778

PiperOrigin-RevId: 213630404

* Treat kDomain instruction as a pure pass-through in HloValue

It doesn't access the data in any way similarly to kTuple so it should
be handled the same way.

PiperOrigin-RevId: 213630620

* Add build rules for mnist_softmax_xla.py so it can work internally.

PiperOrigin-RevId: 213637804

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213640434

* [XLA:CPU] Add an emitter for erfinv(double) and erfinv(half).

This is used by the random number generator. Same algorithm as for float, just with more
precision. fp16 is upcasted to fp32 and then processed with the float algorithm.

PiperOrigin-RevId: 213648736

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213651158

* Fix estimator_training test flakiness.

PiperOrigin-RevId: 213653403

* Return error message with illegal input rather than check-failing in op_kernel.

PiperOrigin-RevId: 213653853

* Force-place embedding variables on CPUs ein eager mode.

This avoids problems which happen because most optimizers do not have sparse updating gpu kernels implemented.

Fixes tensorflow#22042

PiperOrigin-RevId: 213654354

* Fix documentation markdown

PiperOrigin-RevId: 213655969

* Enable large constant array deduping by default.
If this causes trouble (makes graph visualizations harder to read, etc)
then consider increasing the default value of dedupe_array_min_size_bytes.

PiperOrigin-RevId: 213656796

* Python interface for Boosted Trees model explainability (currently includes directional feature contributions); fixed ExampleDebugOutputs bug where it errors with empty trees.

PiperOrigin-RevId: 213658470

* Add a space to the error message.

PiperOrigin-RevId: 213661062

* Re-enable flaky keras_test

PiperOrigin-RevId: 213665390

* Remove non-determinism in model-parallel compilation

PiperOrigin-RevId: 213667385

* Fixed broken links

* [XLA:TF] Re-disable testRandomUniformIsInRange

The bug is still there and makes this test flakily fail with fp16.

PiperOrigin-RevId: 213669453

* Convert more kernel signatures to use runtime shapes.

PiperOrigin-RevId: 213673402

* Adds an experimental package group to allow Swift and ObjC targets to depend on the "c_api" target.

PiperOrigin-RevId: 213673549

* Simplify ir_emitter_unnested so that it doesn't take a look at conv
custom call and try to understand what's inside. convolution_thunk does
it anyway.

PiperOrigin-RevId: 213676051

* Fixes in ResolveReorderAxes.
The main issue is we were keeping the input array, updating it in place and discarding the output array. That was a problem when the input array had multiple consumer ops. Now we're keeping the output array instead, which is the correct thing to do. However, in order to minimize disruption, we keep using the input array's name whenever possible, by means of some array renamings.

PiperOrigin-RevId: 213678219

* Two improvements in resolve_tensorflow_matmul:
1. Before inserting a new Transpose node, check if there already is one that
   may be reused. In practice, there are two cases: either the array being
   transposed is a constant (by far the most common case) or it's not.
    * If it is constant, then this doesn't really make a difference:
      ResolveConstantTranspose runs anyway, eliminating these Transpose nodes
      and also mootifying this change as it leaves no Transpose node to be
      reused. So in that case, constant-array-deduping is really the only
      thing that prevents duplication of data.
    * If it is not constant, that's where this new logic really helps, as
      the resulting Transpose nodes are here to stay in the final graph,
      and this avoids inserting more than are needed.
2. transpose_a is not supported. However, rather than CHECK-fail, it's more
   useful to have this graph transformation bail with a log message. The
   resulting 'unresolved' MatMul node could still be handled in some way
   at the TFLite level, or we could end up having support for MatMul per se.

PiperOrigin-RevId: 213678294

* Remove the CHECK added for debugging.

PiperOrigin-RevId: 213681549

* Fixes bits/bytes unit error in comment.

PiperOrigin-RevId: 213684048

* [tf.data] MapVectorization optimization: C++ conversion framework to vectorize a MapDefun function. Also implements conversion for two ops: Cast and Unpack.

PiperOrigin-RevId: 213686720

* Remove LOG(INFO) in MetaOptimizer:Optimize as this currently produces a large number of debugging outputs in the INFO log that look like:

I0917 16:20:11.073992    9191 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph
I0917 16:20:11.079458    9191 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph
I0917 16:20:11.084827   12447 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph
I0917 16:20:11.089359   12447 meta_optimizer.cc:334] Starting optimization for grappler item: tf_graph

After this change those lines will simply no longer appear.

RELNOTES: n/a
PiperOrigin-RevId: 213690759

* Added ABSL_DEPRECATED annotations to various deprecated TensorFlow functions.

PiperOrigin-RevId: 213693027

* Add min/max version for depthwise conv.

PiperOrigin-RevId: 213698663

* Allow the tape tensor to have unknown shapes.

This is done by making the TapeTensor a template rather than a concrete struct.

PiperOrigin-RevId: 213700425

* Create a steps_per_run variable to be updated correctly in the fit loop to make sure we run fit for the right number of steps.

PiperOrigin-RevId: 213706042

* Boosted trees: Add error messages when tree complexity parameter is not properly set.

PiperOrigin-RevId: 213706101

* This CL adds a new `tf.print` operator that more closely aligns with the standard python `print` method, and deprecates the old `tf.Print` operator (to be removed in in v2.0).

It follows the design doc specified in tensorflow/community#14 and additionally incorporates the community feedback and design review decisions.

This CL adds two new internal graph operators: a StringFormat operator that formats a template string with a list of input tensors to insert into the string and outputs a string scalar containing the result, and a PrintV2 operator that prints a string scalar to a specified output stream or logging level.

The formatting op is exposed at `tf.strings.Format`. A new python method is exposed at `tf.print` that takes a list of inputs that may be nested structures and may contain tensors, formats them nicely using the formatting op, and returns a PrintV2 operator that prints them. In Eager mode and inside defuns this PrintV2 operator will automatically be executed, but in graph mode it will need to be either added to `sess.run`, or used as a control dependency for other operators being executed.

As compared to the previous print function, the new print function:
- Has an API that more closely aligns with the standard python3 print
- Supports changing the print logging level/output stream
- allows printing arbitrary (optionally nested) data structures as opposed to just flat lists of tensors
- support printing sparse tensors
- changes printed tensor format to show more meaningful summary (recursively print the first and last elements of each tensor dimension, instead of just the first few elements of the tensor irregardless of dimension).

PiperOrigin-RevId: 213709924

* Go: Update generated wrapper functions for TensorFlow ops.
PiperOrigin-RevId: 213716034

* [XLA] Add R2 strided slice test.

PiperOrigin-RevId: 213718019

* Add VerifiedHloModule class.
VerifiedHloModule is derived from HloModule and verifies itself on destruction. This is designed to be used in HloVerifiedTestBase. This replaces the current mechanism which verifies HloModules in the TearDown method. The VerifiedHloModule approach is cleaner (less state on the test object) and more capable because these verified HLO modules can be passed to methods which require taking ownership of the module (eg, HlotestBase::Execute).

This change required some changes to the parser which enables constructing the parsed HloModule into an already allocated HloModule. Some trivial changes to HloModule are required as well.

PiperOrigin-RevId: 213718126

* Allow setting a global override for the "allow_growth" GPU option via the TF_FORCE_GPU_ALLOW_GROWTH environment variable.

PiperOrigin-RevId: 213728460

* TOCO transformations updated to support dilated depthwise convolution.

PiperOrigin-RevId: 213729750

* Update ops-related pbtxt files.

PiperOrigin-RevId: 213729979

* Fix the error message thrown when running eval on pod

PiperOrigin-RevId: 213730668

* Copy Tensor._handle_data from external_capture to placeholder for Variant tensors in Graph mode defun.
This allows inferring the shape of values popped from TensorLists inside defuns.
Remove "Resource" from {Set|Get}ResourceHandleShapeAndType since the same functions are re-usable for variants.
Eager mode fix coming in a future changelist.

PiperOrigin-RevId: 213735462

* BEGIN_PUBLIC
It's desirable to run int64 compute on GPU. Rolling back the folowing CL.

*** Original change description ***

Register a new Sum op for T:int64 and Tidx:int32

END_PUBLIC

Automated rollback of commit a9a5929

PiperOrigin-RevId: 213736058

* Update TF Lite subsite

PiperOrigin-RevId: 213737482

* Internal change.

PiperOrigin-RevId: 213749129

* Fix typo error in grapper remapper optimizer.

* Speeds up _random_flip for batched images.

PiperOrigin-RevId: 213753728

* Add feature_group_count parameter of Convolution op to xla_client.py.

This parameter has been added to HLO to support depthwise convolution.

PiperOrigin-RevId: 213761790

* Add AOT test case for XlaSort.

The only tensorflow op that uses XlaSort is nn.top_k, so we add a test case
using nn.top_k.

PiperOrigin-RevId: 213763591

* Automated rollback of commit 31c0857

PiperOrigin-RevId: 213764810

* Internal change.

PiperOrigin-RevId: 213770000

* Automated rollback of commit da3357e

PiperOrigin-RevId: 213771631

* compat: Update forward compatibility horizon to 2018-09-20

PiperOrigin-RevId: 213773990

* [XLA:TF] Whitelist quantized types for CPU/GPU

These have the same behavior as unquantized types so we can just pass them
through to XLA (which converts them to unquantized types). They're supposed to
be used with special ops, none of which are currently implemented by XLA.
Casting (without quantization) and basic math works fine though.

These do not have a corresponding numpy type, so only tests using TF types will
see them.

PiperOrigin-RevId: 213781650

* Fix typo in _EnforceShapeInvariant.

PiperOrigin-RevId: 213801006

* Callbacks should count the steps correctly in the multi step case

PiperOrigin-RevId: 213829360

* [tf.data] Use vectorization_utils::VectorizeMapDefun in MapVectorization optimization

PiperOrigin-RevId: 213840320

* [SE] Use absl instead of TF classes where an absl version exists

With the exception of StrCat all of these are using absl already, this change
just removes one layer of indirection.

PiperOrigin-RevId: 213846036

* [data-stats] Adds number of filtered elements as scalar summary, also adds number of filtered elements to monitoring counter.

PiperOrigin-RevId: 213846793

* Moving tpu_embedding_config.proto to tpu_embedding_configuration.proto, refactoring it, adding several new fields and an EmbeddingOutputLayout message to provide experimental support for controlling the embedding output.

PiperOrigin-RevId: 213849572

* Replace the OrderedDict with a basic list/dict solution. OrderedDict is problematic to use in eager because of the circular references it creates.

PiperOrigin-RevId: 213862402

* Fix _handle_data of variant and resource type outputs of While op in while_v2.

tensorflow/community#13

PiperOrigin-RevId: 213862844

* Add searchsorted (ie lower/upper bound) op.

PiperOrigin-RevId: 213863392

* Modify docs under contrib/distributions to point to tfp.

PiperOrigin-RevId: 213866466

* Updating doc references to tf.distributions to point to tfp.distributions.

PiperOrigin-RevId: 213867606

* Simplifies the ResourceVariable constructor.

PiperOrigin-RevId: 213872127

* This CL adds a Keras-based mobilenet_v2 feature extractor for object detection models.

As part of this CL, we use the Keras mobilenet_v2 application's keyword argument layer injection API to allow the generated network to support the object detection hyperparameters.

PiperOrigin-RevId: 213872175

* [tf.data] Fixes for two recently introduced use-after-free bugs.

1. In ParallelMapIterator, do not call `cond_var_.notify_all()` without holding
   the associated mutex. In some cases, the iterator may have been deleted
   between releasing the lock and notifying the condition variable, which
   leads to a use-after-free. This change applies this style to all use of
   condition variables in tensorflow/core/kernels/data/.

2. In CapturedFunction::RunAsync(), do not use `shared_ptr` to manage
   the lifetime of objects that (potentially) borrow from runtime
   objects. The present code runs the destructor after the `done()`
   callback is called, but the `done()` callback may be the last
   action in a session, and thus trigger destruction of those borrowed
   objects. In that case, the `shared_ptr` destructor may use the
   borrowed objects after they are freed.

PiperOrigin-RevId: 213872829

* Update ops-related pbtxt files.

PiperOrigin-RevId: 213873471

* Implement TF graph capture.

PiperOrigin-RevId: 213875284

* Fix bug in Pow optimizer rule when broadcasting is involved.
Minor cleanup by moving the helper function ShapesEqual to GraphProperties and adding unit tests for it.

PiperOrigin-RevId: 213876779

* Include the print function in the list of special functions - its name is not found in the namespace in Python 3.

PiperOrigin-RevId: 213879813

* [Java]: Release 1.11.0-rc1

PiperOrigin-RevId: 213882538

* [XLA] Don't create mixed precision operations accidentally

The reshape we created change the element type unintentionally.

PiperOrigin-RevId: 213883142

*  Remove restriction on scope for bypass operators. Previously, the scope had to be of the form 'scope/<arbitrary_text>'. Relax restriction to handle empty scopes. Enable this change to work for both fused and unfused batch norm layers

PiperOrigin-RevId: 213883621

* Fix missing TODO.

PiperOrigin-RevId: 213885561

* [tf.data] Some vectorization cleanup

PiperOrigin-RevId: 213886813

* Add more specific ReLU implementation tests.

PiperOrigin-RevId: 213890403

* This CL moves the tf.print logging level tests that are sensitive to OS & environment configurations to a separate test target, and disables running them on Windows.

PiperOrigin-RevId: 213895372

* Split XlaLaunch into XlaCompile and XlaRun; NFC

This CL splits the functionality in XlaLaunch into two separate operations:

 - XlaCompile, responsible for compiling a TF function into a LocalExecutable
 - XlaRun, responsible for executing a LocalExecutable created by XlaCompile

This CL is a stepping stone towards implementing lazy compilation for TF/XLA.
The XlaCompile op is spec'ed to return a boolean indicating whether the
compilation was successful.  Right now that boolean is always set to true by
XlaCompile and its value is otherwise ignored, but in the future it will be used
to indicate whether the TF function was compiled or not, and thus whether we
should execute XlaRun or just directly call the TF function.

XlaLaunch still exists, and will be created by create_xla_launch_op.cc.  In the
future we may consider removing it altogether.  build_xla_launch_ops.cc, now
renamed to build_xla_ops.cc, creates a XlaCompile/XlaRun pair instead of
XlaLaunch.

This CL is organized as follows:

 - jit/ops/xla_ops.cc gets two new XLA-specific operations, XlaCompile and
   XlaRun, described above.  XlaRun redundantly takes the must-be-constant
   inputs to the TensorFlow cluster to keep the implementation simple (simple in
   the sense of similar to XlaLaunch), but I will remove this in a subsequent
   cleanup CL.

 - jit/kernels/xla_ops.cc implements XlaCompile and XlaRun in a fairly
   straightforward manner.  XlaCompile compiles the TF function, puts it in a
   process-global storage, XlaExecutableClosureStore, and produces a int64 key.
   XlaRun uses the key to read out the LocalExecutable and execute it.  I'm not
   sure if XlaExecutableClosureStore should be a resource like
   XlaCompilationCache; I did not immediately see any reason to make it so.

 - There are changes to the various _device files to register XlaCompile and
   XlaRun for the XLA_* devices.

 - Finally, I had to fix some tests that were expecting XlaLaunch in the
   execution timeline.

PiperOrigin-RevId: 213895405

* Change all YAML booleans from True/False to true/false.

PiperOrigin-RevId: 213896057

* It is more computationally efficient to represent resize bilinear as a
depthwise convolution instead of a full convolution now that it exists in XLA.

PiperOrigin-RevId: 213896333

* [tf.data] Moving auto-tuning optimizations into a background thread, refactoring the API for exposing tunable parameters, and removing `model::Node` from the public API.

PiperOrigin-RevId: 213907565

* Fixes regression to tf.Print that removed square braces around printed tensors.

PiperOrigin-RevId: 213912507

*  Support 16 ways model parallelism.

PiperOrigin-RevId: 213913013

* Updating doc references to tf.distributions to tfp.distributions.

PiperOrigin-RevId: 213915666

* Update links to tf lite site.

PiperOrigin-RevId: 213917881

* Update links to install pages.

PiperOrigin-RevId: 213917946

* Add an API which gives explicit control over shard sizes and introspection into the number of shards used. This is a variant of threadpool::parallelFor

PiperOrigin-RevId: 213920649

* Make threading.local not an instance member of collective ops because in python3 threading.local cannot be pickled.

PiperOrigin-RevId: 213928766

* Return model format from LoadSessionBundleOrSavedModelBundle(),
allowing callers to know if we up-converted a SessionBundle to
SavedModel format.

PiperOrigin-RevId: 213937542

* Fix cub include path so that TensorFlow compiles when used as a bazel dependency.

PiperOrigin-RevId: 213942340

* Move from deprecated self.test_session() to self.cached_session().

self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about:
* the fact that the session may be reused.
* the session is not closed even when doing a "with self.test_session()" statement.

PiperOrigin-RevId: 213944355

* Move from deprecated self.test_session() to self.cached_session().

self.test_session() has been deprecated in 9962eb5 as its name confuses readers of the test. Moving to cached_session() instead which is more explicit about:
* the fact that the session may be reused.
* the session is not closed even when doing a "with self.test_session()" statement.

PiperOrigin-RevId: 213944932

* keras/training.py: Improve error message.

Inspired by:
https://stackoverflow.com/questions/52428939/eager-mode-optimizers/

PiperOrigin-RevId: 213948133

* Internal change.

PiperOrigin-RevId: 213948394

* [TF:XLA] Bump open source llvm revision to r342644

PiperOrigin-RevId: 213952786

* [XLA:CPU] Re-enable half float tests for unary ops

This was blocked by an LLVM bug, which was fixed in r342542.

PiperOrigin-RevId: 213953743

* compat: Update forward compatibility horizon to 2018-09-21

PiperOrigin-RevId: 213955428

* Added fetch support for attrs classes.

Given a class

@attr.s()
class SampleAttr(object):
  field_1 = attr.ib()
  field_2 = attr.ib()

we will be able to run

obj = SampleAttr(tensor_1, tensor_2)
session.run(obj) # equivalent with session.run([obj.field_1, obj.field_2])

Please note, this does not need nest flatten support (which is only relevant to the feed_dict argument).

Also, the information in __attrs_attrs__ is provided for extensions (as per the docs: http://www.attrs.org/en/stable/extending.html#extending-metadata) like this and is not an "implementation detail".

PiperOrigin-RevId: 213963978

* Use weakrefs where absolutely safe to do so, in order to reduce the number of circular references. Replace unnecessary OrderedDict with a regular dict.

PiperOrigin-RevId: 213982097

* [TPU] Change the TPU DeviceAssignment class to use a flatter (replica, logical core) indexing scheme for cores.

Previously the DeviceAssignment class mixed both a general concept (a mapping from (replica, logical core) to physical TPU core) and a specific instantiation of that concept, by imposing a particular 3D grid structure on the logical core numbers. This was excessive ? while the physical core numbers have a particular structure, there is no need to impose any particular structure on the logical core numbers.

This change simplifies the DeviceAssignment scheme, changing it so logical cores within a replica are numbered sequentially without any particular semantics.

PiperOrigin-RevId: 213984629
@tomerk
Copy link
Contributor Author

tomerk commented Feb 7, 2019

Hi @ewilderj, a follow up on all this:
tf.print now successfully prints to colab notebooks, as a result of adding a Log listener registration mechanism to the Tensorflow C api that is used by tf.print, and registering a listener in interactive environments that prints to the Python stdout.

For now though we will be leaving output_stream in the tf.print's api because it matches how python's own print method works. If people want to extend this in the future they will be able to extend Tensorflow's source with new log listeners, or to expose interactive python log listener registration in more than just interactive environments, and do whatever they want with python's stdout.

@ewilderj
Copy link
Contributor

Thanks for the update.

So to be clear, will this design proposal be resumed, or should I just archive it?

@ewilderj ewilderj added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Feb 12, 2019
@ewilderj ewilderj moved this from In Revision to Accepted RFCs in RFC management Feb 12, 2019
@ewilderj ewilderj merged commit 0530779 into tensorflow:master Feb 21, 2019
karllessard added a commit to karllessard/tensorflow-community that referenced this pull request May 10, 2019
* Adding a doc to deprecate collections

* Responding to Karmels comments

* Minor fix to VariableTracker sample code

* RFC for random numbers in TensorFlow 2.0

* Changes after some feedback

* Removed 'global_seed' in the main code and showed the design with 'global_seed' in the Questions section.

* Some changes after feedback

* A tweak

* Change after feedback

* A tweak

* changes

* changes

* fix link

* new-rfc

* changes

* Update rfcs/20181225-tf-backend.md

Co-Authored-By: alextp <apassos@google.com>

* Added some considerations about tf.function

* Renamed the internal name "op_generator" to "global_generator"

* Changed seed size from 256 to 1024 bits

* Initial signpost for community meetings

Adding this so there is basic information about how to find the community calendar and get invited to meetings.

* Add iCal link too

* changes

* Initial version of embedding and partitioned variable RFC.

* Fix one formatting issue.

* Fix another formatting issue.

* Use markdown language for the table instead of HTML.

* Add tensorflow/io R Package CRAN release instructions (tensorflow#53)

* Added Design Review Notes

* Make clear distinction between embedding variables and loadbalancing
variables.

* Added decisions below each question, and "how to use generators with distribution strategies".

* Adopted Dong Lin's suggestions

* Add a paragraph pointing out the problem with the `partition_strategy` argument.

* RFC: Move from tf.contrib to addons (tensorflow#37)

* Checkpoint addons RFC for review

* Add code review to RFC

Add future pull request information to criteria

Update modified date

added some description

RFC Move to addons

* Add weight decay optimizers

* Remove conv2d_in_plane

* Add group_norm

* Accept addons RFC

* Update alternatives since `DynamicPartition` and `DynamicStitch` do have GPU kernels.

* Add a section for saving and restore `PartitionedVariable`.

* Mention that variable types can be nested, attention needs to be paid to their saving and restoring mechanism.

* Create README.md (tensorflow#57)

* Splitted `_state_var` into `_state_var` and `_alg_var` (because of concerns from implementation), and changed status to "Accepted"

* Updated timestamp

* Moved the auto-selection of algorithm from `create_rng_state` to `Generator.__init__`

* Update according to the discussion

* Move performance heuristics in Distribution Strategy level. We will not expose knobs for users to control;
* Emphasize that embedding support in v2 will all be via `Embedding` layer. Users can use `tf.compat.v1` to handle embedding by themselves;
* Mention that default `partition_strategy` in v1 `embedding_lookup` is "mod", which will possibly break users's model when they update to TF 2.0;
* We want to prioritize shuffling embedding after 2.0 release;
* We have plans to serialize and deserialize `Embedding` layer and Distribution Strategies to allow loading a saved model to a different number of partitions.

* Update relese binary build command for sig-io (tensorflow#58)

This PR updates relese binary build command for sig-io

Signed-off-by: Yong Tang <yong.tang.github@outlook.com>

* Add Bryan to SIG IO release team (tensorflow#59)

* Change to accepted

* Add link to TensorFlow IO R package

* Updated link for the friction log. (tensorflow#64)

* Switch DistStrat revised API examples to TensorFlow 2 style. (tensorflow#63)

* RFC: Attention for Dense Networks on Keras (tensorflow#54)

* Design review for "Attention for Dense Networks"

* RFC: Stateful Containers with tf.Module (tensorflow#56)

* Create 20190117-tf-module.md

* Update 20190117-tf-module.md

* Loosen return type for variable properties.

* Use Dense consistently.

Thanks brilee@ for spotting!

* Remove convert_to_tensor from examples.

This wasn't ever required and including it might cause confusion.

h/t pluskid@ gehring@ and awav@

* Remove owned_* methods.

* Document `_flatten`

See tensorflow/tensorflow@5076adf6 for more context.

* Fix typo in module name.

Thanks k-w-w@!

* Update 20190117-tf-module.md

* RFC: New tf.print (tensorflow#14)

* New tf.print proposal

* Attempt to fix table of contents

* Removed not-working TOC label

* Minor updates to the doc.

* Update tf.print to be accepted

* Added design review notes

* Marking doc as accepted

* Update cond_v2 design doc (tensorflow#70)

* Update to bring in line with implementation

* Added the symbol map to the RFC.

* Updated testing section of the Community site.

* Removed the 100%, formatting tweaks.

* Update CHARTER.md

* Change contact email address

I will leave my current company soon, so update my email.

* Create README.md

* Logos for SIGs

* Update README.md

* Update addons owners (tensorflow#85)

Add Yan Facai as another project lead.

* Created a FAQ for TF 2.0. (tensorflow#78)

Adding 2.0 related FAQ to the Testing group.

* Request and charter for SIG JVM (tensorflow#86)

Chartering docs for SIG JVM

* Update CODEOWNERS

Add @karllessard, @sjamesr and @tzolov as code owners for sigs/jvm.

* Update CODEOWNERS

Add missing /

* Update CODEOWNERS

Add @dynamicwebpaige as owner for sigs/testing/

* Update RFC with current information (tensorflow#89)

Make current to SIG Addons

* RFC: TF on Demand Project (tensorflow#69)

* Adding an RFC for TF on Demand Project.

* modified one line in tf-on-demand md file.

* Changing RFC status from PROPOSED to ACCEPTED.

* RFC: SavedModel Save/Load in 2.x (tensorflow#34)

* RFC for SavedModel Save/Load in 2.x

* Minor edits and a discussion topic for load() with multiple MetaGraphs

* Tweak to the "Imported representations of signatures" section

* Update "Importing existing SavedModels" with the .signatures change

* Update RFC and add review notes

* Status -> accepted

* Update CHARTER.md

New leads.

* Update 20180920-unify-rnn-interface.md (tensorflow#81)

Typo fix.

* Update yyyymmdd-rfc-template.md

Adding "user benefit" section into the RFC template, to encourage articulating the benefit to users in a clear way.

* Update while_v2 design doc (tensorflow#71)

* Update while_v2 design doc, include link to implementation

* Update TF 2.0 FAQ to link to TensorBoard TF 2.0 tutorial (tensorflow#94)

* CLN: update sig addons logo png (tensorflow#99)

* Add SIG Keras

Add a reference link to Keras' governance repository for SIG Keras.

* RFC: String Tensor Unification (tensorflow#91)

* RFC: String Tensor Unification

* Updated rfcs/20190411-string-unification.md

Updated TFLite sections to address feedback from @jdduke.  Marked as
Accepted.

* Start RFC for tensor buffers
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2.0 TensorFlow 2.0 development cla: yes RFC: Accepted RFC Design Document: Accepted by Review
Projects
RFC management
  
Accepted RFCs
Development

Successfully merging this pull request may close these issues.

None yet

7 participants