When GlobalJitLevel is on, disable the Grappler memory opt. #32245

trentlo · 2019-09-05T16:39:18Z

This commit disables the Grappler memory optimizer when the XLA JIT is detected on.

The (current) XLA clustering can result in loss of concurrency between kernel compute and memory copies. As such, it usually loses the concurrency needed to hide the latencies of the inserted swap-ins and swap-outs and incurs great performance overhead.

You may find more details about the performance degradation in this document:
https://docs.google.com/document/d/1q1UPN2CRRNBoUXM0zORT-cTG9m7ctQRB2xfUPhaK1Ek/edit?usp=sharing

@sanjoy, please also help to take a look.

This relaxes the disable check and should be a slightly better behavior, as users still have some ways to enable the memory optimizer when they want to.

trentlo · 2019-09-05T18:38:32Z

One question:

According to GetGlobalJitLevelForGraph(), there are two ways to enable the XLA JIT; one is through the ConfigProto and the other is through the environment variable.

This PR can only check the ConfigProto and cannot check the env var flags as they only exist in the XLA world.

Do you think this is an issue? Any suggestions to address it?

nouiz · 2019-09-05T18:47:34Z

maybe call getenv and parse the env variable?

sanjoy · 2019-09-05T19:24:36Z

This PR can only check the ConfigProto and cannot check the env var flags as they only exist in the XLA world.

I think we should just expose GetGlobalJitLevelForGraph to to TensorFlow by moving it to a header/cc outside compiler/xla.

@rmlarsen WDYT?

trentlo · 2019-09-05T22:25:21Z

This PR can only check the ConfigProto and cannot check the env var flags as they only exist in the XLA world.

I think we should just expose GetGlobalJitLevelForGraph to to TensorFlow by moving it to a header/cc outside compiler/xla.

@rmlarsen WDYT?

It is more than that, as the GetGlobalJitLevelForGraph depends on compiler/jit/flags.cc, etc. which do not exist in a Tensorflow build without enabling XLA.

If we want to move the function, we will have to move several xla flag related files with it. Considering that, below are two options we have.

Just check the proto config without checking the env variables. This may make sense as the proto config should be the way for clients to enable XLA. Env variables are more for testing? Let me know if this is not the case though.
Move the xla flags related files into TF.

Personally, I am slightly leaning towards option 1 but please give some guidance. Thanks.

trentlo · 2019-09-05T22:28:36Z

maybe call getenv and parse the env variable?

IMHO, re-implement the parsing logic in TF may be ugly because there will then be two separate places to be maintained for future env variable change.

sanjoy · 2019-09-05T23:06:31Z

This may make sense as the proto config should be the way for clients to enable XLA. Env variables are more for testing? Let me know if this is not the case though.

Env vars are less elegant but they're super convenient when trying out XLA ("no need to make code changes, just flip this flag") so I'd like to make this work with env vars if possible.

Another possibility is to do some form of dependency injection via a registry -- allow the XLA JIT to register a callback that lets TF query whether the XLA global jit is enabled. This will be more code but should be cleaner overall.

trentlo · 2019-09-05T23:52:58Z

Another possibility is to do some form of dependency injection via a registry -- allow the XLA JIT to register a callback that lets TF query whether the XLA global jit is enabled. This will be more code but should be cleaner overall.

Good suggestion. Let's proceed with this route.

Callbacks can be registered to this class so that runtime environment flags can be parsed to change configs in the Tensorflow core. A primary use of this is for the Tensorflow core to query the XLA JIT level, which can be configured by some runtime environment flags in addition to ConfigProto.

trentlo · 2019-09-10T00:57:58Z

I implemented a config proxy in the Tensorflow core per our previous discussion. Please help to take a look when you have a moment. Thanks.

sanjoy · 2019-09-11T03:03:09Z

tensorflow/core/util/xla_config_proxy_test.cc

+
+  // Register callbacks.
+  XlaConfigProxy::ConfigSetterRegistration<ConfigA>
+  contrig_setter_registration_a([](ConfigA& config_a) -> bool {


contrib_setter_registration_a

Execute me for the typo. Will fix them immediately.

sanjoy · 2019-09-11T03:06:31Z

tensorflow/core/util/xla_config_proxy.h

+namespace tensorflow {
+
+// A proxy class of XLA config.
+class XlaConfigProxy {


I'm not sure I understand how you're planning to use this, do you mind elaborating a bit?

Here is what I had in mind though: have a registry that lets XLA register an std::function<OptimizerOptions::GlobalJitLevel(const GraphOptimizationPassOptions& options)>. Then Grappler can check if XLA is enabled using this registered std::function; if nothing is registered then it can assume that XLA is not enabled.

Very similar. Just that we need a singleton (or at least some static functions) for registering this std::function callback. And I slightly generalize the registry class into ConfigSetterRegistry.

ConfigSetterRegistry is a singleton utility class to register a std::function<bool(ConfigType&)>; meaning that it allows each different config type to register a callback. In the current case, XLA registers a std::function<bool(GlobalJitLevel&)>, and TF can then invoke this std::function to update the GlobalJitLevel value.

Perhaps the generalization loses the readability. If that is the case, let me remove the ConfigSetterRegistry and simply make XlaConfigProxy a singleton. Please feel free to let me know what you think.

I think this generality is "YAGNI" for now.

I see your point.

trentlo

Thanks for the review. Response inlined.

trentlo · 2019-09-11T04:26:15Z

tensorflow/core/util/xla_config_proxy.h

+namespace tensorflow {
+
+// A proxy class of XLA config.
+class XlaConfigProxy {


Very similar. Just that we need a singleton (or at least some static functions) for registering this std::function callback. And I slightly generalize the registry class into ConfigSetterRegistry.

ConfigSetterRegistry is a singleton utility class to register a std::function<bool(ConfigType&)>; meaning that it allows each different config type to register a callback. In the current case, XLA registers a std::function<bool(GlobalJitLevel&)>, and TF can then invoke this std::function to update the GlobalJitLevel value.

Perhaps the generalization loses the readability. If that is the case, let me remove the ConfigSetterRegistry and simply make XlaConfigProxy a singleton. Please feel free to let me know what you think.

trentlo · 2019-09-11T04:30:28Z

tensorflow/core/util/xla_config_proxy_test.cc

+
+  // Register callbacks.
+  XlaConfigProxy::ConfigSetterRegistration<ConfigA>
+  contrig_setter_registration_a([](ConfigA& config_a) -> bool {


Execute me for the typo. Will fix them immediately.

Also, rename XlaConfigProxy to XlaConfigRegistry.

trentlo

Updated. Please help to take a look again. Thanks.

trentlo · 2019-09-11T19:22:32Z

tensorflow/core/util/xla_config_proxy.h

+namespace tensorflow {
+
+// A proxy class of XLA config.
+class XlaConfigProxy {


I see your point.

tensorflow/compiler/jit/xla_cluster_util.cc

trentlo · 2019-09-11T19:27:40Z

tensorflow/compiler/jit/xla_cluster_util.cc

+    OptimizerOptions::GlobalJitLevel jit_level_in_session_opts) {
+  XlaGlobalJitLevel xla_global_jit_level =
+      GetXlaGlobalJitLevel(jit_level_in_session_opts);
+  // Take the general flag to avoid the dependency on Tensorflow::Graph.


I simply ignore the single_graph flags, as tensorflow::Graph is not accessible in InitializeOptimizers(), where grappler configures its optimizers.

It makes sense to me that we simply expose the general flag (,although both of the general and single_gpu flags are currently the same). Let me know what you think about this though.

I'm worried that this makes XlaConfigRegistry::GetGlobalJitLevel "look" misleading.

How about this: we make the callback return an instance of XlaGlobalJitLevel. Then in optimizers/meta_optimizer.cc we do:

XlaGlobalJitLevel xla_global_jit_level = ...; if (xla_global_jit_level.general is ON) { Disable memory opt } // We don't care about single GPU because the regression happens only on multi-GPU

This is a bit of a hack because if the graph happens to be single GPU but the general auto-jit is enabled then we're still going to disable the optimization but I think for getting us started this is fine. Behaviorally this is identical to what you have but I think it is clearer.

Also a bit more context: if only single_gpu is enabled we absolutely don't want any behavior change on multi-GPU graphs. We're using --tf_xla_auto_jit=single-gpu(2) to stage the rollout of XLA and we want --tf_xla_auto_jit=single-gpu(2) to only affect single GPU graphs.

I see your point. Thanks for the good feedback.

I made some changes:

Made the callback return XlaGlobalJitLevel, as this way is indeed less ambiguous.

I added check in the MetaOptimizer such that the memory optimizer is turned off only when JIT is on for both single-gpu and multi-gpu graphs. This is a conservative approach and should be good enough for now.

Thanks for the context about how the single-gpu flag is used. It was not clear to me and it looks good to me.

Note that the swap-in/out latency issue should also exist in single gpu since the copy stream is different from device compute stream in the default Tensorflow setting. (i.e., concurrency between streams can be lost.)

A bit more context: note that the memory optimizer can be triggered in a single-gpu graph, as long as the estimated device memory pressure is high. In the report, it is just that the inserted Horovod nodes increase the memory pressure to trigger the swap-in/out insertion condition. It does not mean that the memory optimization cannot get triggered in a single gpu graph.

trentlo · 2019-09-18T22:49:30Z

@rthadur

I noticed that the Ubuntu CC test may be stuck. Will that block the merging?

sanjoy · 2019-09-18T23:00:38Z

Will that block the merging?

I'm taking care of the merge. Sorry for the delay, I'm a bit busier than usual.

trentlo · 2019-09-18T23:31:05Z

I'm taking care of the merge. Sorry for the delay, I'm a bit busier than usual.

Got it. No worries and take time then. Thanks.

PiperOrigin-RevId: 270071858

byronyi · 2019-09-23T17:32:14Z

@trentlo @goldiegadde I am not sure if this PR catch the r2.0 cherry-pick cycle. It does seem to fix a performance regression running the official ResNet50 model w/ XLA 8-GPU.

I observed significant performance boost (~3900 image/s to ~7500 image/s) comparing 2.0 nightly packages in 2019-09-19 (wo/ this PR) and 2019-09-20 (w/ this PR).

The benchmark I am running:

python3 official/vision/image_classification/resnet_imagenet_main.py \
    --ds=default \
    --bs=2048 \
    --dtype=fp16 \
    --enable_xla \
    --ng=8 \
    --use_tensor_lr \
    --pgtc=2 \
    --gt_mode=gpu_private \
    --datasets_num_private_threads=48

Ping @tfboyd to confirm. Several other CLs on the same day are suspicious, but my hunch is this one.

EDIT: my guess is wrong. Turns out 59436c1 that reverts 53bdcf5 improves the performance greatly.

byronyi · 2019-09-23T17:33:35Z

I will cherry-pick this on my own r2.0 branch and report the numbers tomorrow.

trentlo · 2019-09-23T20:56:01Z

@byronyi FYI as below.

Indeed, cherry-pick the commits and measure the performance difference is the most direct way to verify.

A quick way to verify is to use the following flags. It dumps a bunch of logged files into ./graph_dump.

TF_DUMP_GRAPH_PREFIX="./graph_dump" TF_XLA_FLAGS="--tf_xla_clustering_debug=true" python3 ...

Then you can use grep -r GpuToHost ./graph_dump to see whether any copy ops are inserted by Grappler. If you see them, likely they can cause great performance penalty. These ops should disappear when XLA is ON if this PR is there.

tfboyd · 2019-09-23T22:00:38Z

I am seeing the same bounce back in the nightly tests. XLA stated they were not focused on multi-GPU tests as their focus was XLA on by default for single GPU so I stopped bothering them with the results. The drop to ~3,900 occurred around 03-AUG in my testing. I see the bounce back to 8,500 with 2.0.0-dev20190922. This is still not back to the ~9K we saw before the drop. I doubt this will get cherry-picked into TF 2.0 as XLA is still experimental and RC2 is likely to be the last RC needed before release.

byronyi · 2019-09-25T12:39:13Z

@tfboyd It turns out that 53bdcf5 was the commit that causes bad performance in the nightly package, not the issue this PR fixes. This CL was reverted in 59436c1, so the performance went back to ~7,600 in our environment.

So there must be some other changes that cause performance degradation in r2.0. Do you mind to point to me a snapshot with the expected performance, i.e. before dropping to ~3,900 around 03-AUG?

tfboyd · 2019-09-26T17:17:56Z

@byronyi Here is the info I have that may be of the most use. It would be nice to get back to the 9K, but it is also possible too many other changes have occurred. I only run on nightly and when nightly works.

Command for the test you would need a copy of imagenet.

# Flags
--batch_size=2048 --data_dir=/data/imagenet --dtype=fp16 --enable_eager --enable_xla --log_steps=10 --model_dir=/workspace/benchmarks/perfzero/workspace/output/2019-08-02-12-50-22-843844/benchmark_xla_8_gpu_fp16 --num_gpus=8 --noreport_accuracy_metrics --skip_eval --train_steps=110

# [PerfZero](https://github.com/tensorflow/benchmarks/tree/master/perfzero) command.  I suggest using perfzero when possible as it helps ensure your
# args are the same and a common docker is used for repeat-ability.  Not perfect but lightweight
# and improving.  
python3 /workspace/benchmarks/perfzero/lib/benchmark.py --benchmark_methods=official.benchmark.keras_imagenet_benchmark.Resnet50KerasBenchmarkReal.benchmark_xla_8_gpu_fp16 --data_downloads=gs://tf-perf-imagenet-uswest1/tensorflow/imagenet --python_path=models --git_repos="https://github.com/tensorflow/models.git;master;6b586a910d74a44f57da4d2335c79a20dc2803ab"

GOOD
07/31
8,723.9 to 9,000 images/sec
tensorflow hash:3e0ad8a004
model garden hash: 13e7c85

BAD
08/03
We had a break in the builds for a few days and a big change to the default code path I believe. Many problems
2,764.3 to 2,900 images/sec.
tensorflow hash: 3205135
model garden: 6b586a9

Current
09/26
8,300 to 8,500 images/sec
tensorflow hash: 1ae01a0
model garden: 6f1e3b38d80f131e21f0721196df7cfc5ced2b74

It might be hard to bisect as I think the model garden code changed significantly between the two dates with the following lines and we also moved the code around recently so running the test now vs back in August there is a different directory:

   # TODO(b/138957587): Remove when force_v2_in_keras_compile is on longer
    # a valid arg for this model. Also remove as a valid flag.
    if flags_obj.force_v2_in_keras_compile is not None:
      model.compile(
          loss='sparse_categorical_crossentropy',
          optimizer=optimizer,
          metrics=(['sparse_categorical_accuracy']
                   if flags_obj.report_accuracy_metrics else None),
          run_eagerly=flags_obj.run_eagerly,
          experimental_run_tf_function=flags_obj.force_v2_in_keras_compile)
    else:
      model.compile(
          loss='sparse_categorical_crossentropy',
          optimizer=optimizer,
          metrics=(['sparse_categorical_accuracy']
                   if flags_obj.report_accuracy_metrics else None),
          run_eagerly=flags_obj.run_eagerly)

@sganeshb has taken over testing. I am not involved in TensorFlow performance going forward.

byronyi · 2019-09-26T23:24:46Z

Thanks Toby @tfboyd! That’s much appreciated.

byronyi · 2019-09-27T11:17:19Z

For the curious: 1d139ed seems to get the performance of r1.15 back to 7600+. However after 1 single day another regression 53bdcf5 degrades the performance to <4,000 again.

tfboyd · 2019-09-27T14:22:06Z

@byronyi Wanted you to know I read your findings and nice work tracking it down. While not good, your work shows the pain of finding old issues, they often end up layered. I would not want you to think your work bisecting was not looked at. It was a rough situation. The XLA team is/was focused only on single GPU performance so they could make XLA on by default for that scenario. They seem to be looking more holistically now.

I am not sure there is anything we can do about it at this point other than move forward with TensorFlow 2.1. FYI TensorFlow 2.0 100% has the XLA regression for multi-GPU.

byronyi · 2019-09-27T16:02:54Z

Thanks Toby! I admit it could be difficult, but I work hard with my colleagues trying to fix these regressions for our internal 2.0 fork.

Personally I have been looking forward to it for too long. Just can't wait there and see our users have to choose between 2.0 w/ new features and old 1.x wo/ perf regressions. We'd really like to have our cake and eat it :)

tfboyd · 2019-09-27T17:10:47Z

@byronyi Not irrelevant at all. I have said before if you can get say 10K examples/sec on 8xV100s with FP16 + XLA but you need to scale to 32 GPUs (making up a number) to do that with FP32 then why are you focused on mulit-node and wasting money. Figure out FP16. There are exceptions for sure. This is a simple mental example. The other point being if you scale up 1 gpu and 8 GPUs (whatever your single node is) then you are scaling up everything else.

tfboyd · 2019-09-27T17:12:06Z

I also agree. It is not cool this exists in 1.x. It came down the XLA team not wanting to focus on mulit-GPU and we stopped TF 1.x testing maybe too early. I do not recall the final data but I stopped nightly perf tests on TF 1.x a couple months ago it was just too much for one person and there was a lack of enthusiasm to resolve issues. :-( Your job/role sounds really cool. I wish we had these groups when I started. I did a lot with multi-node and desperately wanted places to test. All I had was AWS and K80s...yup a long time ago. I eventually just gave up as it was pointless as the P100s came out and I did not have access. A very long story not approved for the public. I was ahead of schedule and burned out when the time came.

Super exciting.

byronyi · 2019-09-28T09:56:37Z

@anj-s I saw 53bdcf5 submitted to master again. I assume the performance regression discussed above has been fixed?

EDIT: I checked that the real bug is fixed by cc9938e. Thanks!

byronyi · 2019-09-28T17:04:18Z

@tfboyd The aforementioned regression seems to be caused by ef9f0e8, identified by bisecting between 08-01 nightly and 08-04 nightly. This commit is also present in both r1.15 and r2.0.

I am testing with model garden version 127b158. The benchmark I am running is:

python3 official/vision/image_classification/resnet_imagenet_main.py \
  --dd=/path/to/imagenet \
  --ds=default \
  --bs=2048 \
  --dtype=fp16 \
  --enable_xla \
  --ng=8 \
  --use_tensor_lr \
  --pgtc=2 \
  --gt_mode=gpu_private \
  --datasets_num_private_threads=48 \
  --fp16_implementation=graph_rewrite

With this CL the performance drops from ~7300 image/s to ~2000 image/s, tested on a 8x V100-PCIE machine (balanced bus topology) using NCCL and MirroredStrategy.

The nightly 1.x TF packages are built using the following setup:

OS: Debian 9.9
GCC: 6.3
Python: 3.5.3
Bazel: 0.26.1
CUDA: 10.0
cuDNN: 7.6.2.24
TensorRT: 5.1.5

bazel build \
    --action_env=CUDA_TOOLKIT_PATH="/usr/local/cuda" \
    --action_env=GCC_HOST_COMPILER_PATH="/usr/bin/gcc" \
    --action_env=PYTHON_BIN_PATH="/usr/bin/python3" \
    --action_env=PYTHON_LIB_PATH="/usr/local/lib/python3.5/dist-packages" \
    --action_env=TF_CONFIGURE_IOS="0" \
    --action_env=TF_CUDA_COMPUTE_CAPABILITIES="6.1,7.0,7.5" \
    --config=cuda \
    --config=numa \
    --config=tensorrt \
    --copt=-Wno-sign-compare \
    --copt=-march=ivybridge \
    --define=with_default_optimizations=true \
    --define=with_xla_support=true \
    --host_copt=-march=ivybridge \
    --python_path="/usr/bin/python3" \
    //tensorflow/tools/pip_package:build_pip_package

I will test by reverting it on the release branches. In the meanwhile, I'd like to ping @rachellim and see if she got any quick fixes for cherry-picking a fix instead of reverting the violating commit.

UPDATE: reverting the following commits improves 8x V100 performance from ~1,600 to ~7,700 image/s on latest r2.0 branch snapshot 64c3d38:

Ping @goldiegadde; is it still possible to revert these commits with 2.0? It will be very important for professional users to reproduce the expected multi-GPU ResNet50 performance numbers.

I build the 2.0 package using the following options:

bazel build \
    --action_env=CUDA_TOOLKIT_PATH="/usr/local/cuda" \
    --action_env=GCC_HOST_COMPILER_PATH="/usr/bin/gcc" \
    --action_env=PYTHON_BIN_PATH="/usr/bin/python3" \
    --action_env=PYTHON_LIB_PATH="/usr/local/lib/python3.5/dist-packages" \
    --action_env=TF_CONFIGURE_IOS="0" \
    --action_env=TF_CUDA_COMPUTE_CAPABILITIES="6.1,7.0,7.5" \
    --config=cuda \
    --config=numa \
    --config=tensorrt \
    --copt=-Wno-sign-compare \
    --copt=-march=broadwell \
    --define=tf_api_version=2 \
    --define=with_default_optimizations=true \
    --define=with_xla_support=true \
    --host_copt=-march=broadwell \
    --python_path="/usr/bin/python3" \
    //tensorflow/tools/pip_package:build_pip_package

rachellim · 2019-09-30T15:35:50Z

Does 96a407a fix the performance issue? If so, maybe we can cherrypick it into the release?

tfboyd · 2019-09-30T17:12:14Z

@rachellim Assuming it does fix the issue cleanly as a cherry-pick. It will not get into TF 2.0 as we are at the end after many weeks. XLA is an experimental feature and doing a cherry-pick for 1.15 (same story with 2.0) may not seem risky is likely not justified form a risk reward stand point. All of that negativity said, you can 100% petition the release owner. I am an XLA fan for additional context.

byronyi · 2019-10-01T16:52:12Z

@rachellim Cherry-Picking 96a407a along with 6d8f05a does seem to fix the performance regression. Ping @jsimsa; any possibility we get these fixes into 2.0.1?

Btw, for r1.15 cherry-picking 96a407a only seems fine. I will test the performance and see how it goes. Ping @goldiegadde; at least we can still cherry-pick for r1.15, right?

rachellim · 2019-10-01T17:08:58Z

~~We'll cherry pick this into r1.15. As for 2.x, I'll let @goldiegadde comment further.~~

Edit: Actually, I'll defer to the release owner @goldiegadde as to how to handle this for both 1.15 and 2.x, since this PR involves pretty significant code change (including behavior changes to datasets with distribution strategy).

Disable the Grappler memory opt when GlobalJitLevel is on.

a82ec8a

tensorflow-bot bot added the size:S CL Change Size: Small label Sep 5, 2019

googlebot added the cla: yes label Sep 5, 2019

rthadur requested a review from sanjoy September 5, 2019 17:36

rthadur self-assigned this Sep 5, 2019

rthadur added this to Assigned Reviewer in PR Queue via automation Sep 5, 2019

rthadur added comp:xla XLA comp:grappler Grappler related issues and removed comp:xla XLA labels Sep 5, 2019

Check GlobalJitLevel only for DEFAULT_MEM_OPT.

cb3f976

This relaxes the disable check and should be a slightly better behavior, as users still have some ways to enable the memory optimizer when they want to.

trentlo added 3 commits September 9, 2019 16:16

Add a unittest for XlaConfigProxy.

348488b

Fix a typo in XlaConfigProxy.

b77327e

rthadur requested a review from rmlarsen September 10, 2019 17:24

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Sep 11, 2019

sanjoy suggested changes Sep 11, 2019

View reviewed changes

trentlo commented Sep 11, 2019

View reviewed changes

Fix typo.

21d983f

rthadur requested a review from sanjoy September 11, 2019 15:15

Simplify XlaConfigProxy by removing the utililies.

88572ba

Also, rename XlaConfigProxy to XlaConfigRegistry.

trentlo commented Sep 11, 2019

View reviewed changes

Minor revision of header macros in xla_config_registry.h.

11e4bbe

kokoro-team removed the kokoro:force-run Tests on submitted change label Sep 13, 2019

rthadur added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Sep 13, 2019

tensorflow-copybara pushed a commit that referenced this pull request Sep 19, 2019

Merge pull request #32245 from trentlo:no_mem_opt_if_jit_on

caa5a8e

PiperOrigin-RevId: 270071858

tensorflow-copybara merged commit 67c50b0 into tensorflow:master Sep 19, 2019

PR Queue automation moved this from Approved by Reviewer to Merged Sep 19, 2019

trentlo mentioned this pull request Sep 19, 2019

When XLA is used, bert nvprof shows no compute and communication overlap NVIDIA/DeepLearningExamples#156

Closed

byronyi mentioned this pull request Sep 29, 2019

Keras model.save() is extremely slow under MirroredStrategy context when keras.layers.BatchNormalization is in use. #32653

Closed

rachellim mentioned this pull request Oct 1, 2019

[r1.15-CherryPick]: RebatchDataset performance fix #32958

Merged

When GlobalJitLevel is on, disable the Grappler memory opt. #32245

When GlobalJitLevel is on, disable the Grappler memory opt. #32245

Conversation

trentlo commented Sep 5, 2019

trentlo commented Sep 5, 2019

nouiz commented Sep 5, 2019

sanjoy commented Sep 5, 2019

trentlo commented Sep 5, 2019

trentlo commented Sep 5, 2019

sanjoy commented Sep 5, 2019

trentlo commented Sep 5, 2019

trentlo commented Sep 10, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sanjoy Sep 11, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trentlo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trentlo left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trentlo Sep 11, 2019 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

trentlo Sep 12, 2019 • edited

Choose a reason for hiding this comment

trentlo commented Sep 18, 2019

sanjoy commented Sep 18, 2019

trentlo commented Sep 18, 2019

byronyi commented Sep 23, 2019 • edited

byronyi commented Sep 23, 2019

trentlo commented Sep 23, 2019 • edited

tfboyd commented Sep 23, 2019

byronyi commented Sep 25, 2019 • edited

tfboyd commented Sep 26, 2019

byronyi commented Sep 26, 2019

byronyi commented Sep 27, 2019

tfboyd commented Sep 27, 2019

byronyi commented Sep 27, 2019

tfboyd commented Sep 27, 2019

tfboyd commented Sep 27, 2019 • edited

byronyi commented Sep 28, 2019 • edited

byronyi commented Sep 28, 2019 • edited

rachellim commented Sep 30, 2019

tfboyd commented Sep 30, 2019 • edited

byronyi commented Oct 1, 2019 • edited

rachellim commented Oct 1, 2019 • edited

sanjoy Sep 11, 2019 •

edited

trentlo Sep 11, 2019 •

edited

trentlo Sep 12, 2019 •

edited

byronyi commented Sep 23, 2019 •

edited

trentlo commented Sep 23, 2019 •

edited

byronyi commented Sep 25, 2019 •

edited

tfboyd commented Sep 27, 2019 •

edited

byronyi commented Sep 28, 2019 •

edited

byronyi commented Sep 28, 2019 •

edited

tfboyd commented Sep 30, 2019 •

edited

byronyi commented Oct 1, 2019 •

edited

rachellim commented Oct 1, 2019 •

edited