XLA Asynchronous compilation #43034

bas-aarts · 2020-09-08T04:59:55Z

XLA Asynchronous compilation
1) add option to opt into asynchronous compilation
2) asynchronous compilation uses a dedicated number of threads
to start a cluster instance compilation while the fallback path
is executed
3) limit number of ongoing compilations to a fixed threshold

change some VLOG levels to make level 2 less verbose

tensorflow/compiler/jit/xla_compilation_cache.cc

sanjoy · 2020-09-24T04:43:08Z

tensorflow/compiler/jit/xla_compilation_cache.cc

+      Entry tmp;
+      VLOG(2) << "Starting asynchronous compilation of cluster "
+              << function_name << '.';
+      (void)CompileStrict(&tmp, options, args, function_name, compile_fn);


I don't think we can ignore the error, we need to report it back to the user. IMO the right solution is to store the Status in entry.

I need a little help understanding a particular error:
TF_RETURN_IF_ERROR(
BroadcastXlaActivity(std::move(jit_compilation_activity)));
(what does this error mean)
In the original code, this error can trigger after compilation. At that point the Entry has been updated, and the compilation result are already stored in the cache. When this error triggers, compilation is ignored for this call only. Subsequent compilations will retrieve the prior compilation result.
So even though this error is triggered, the compilation passed, and "in the future" the result can be used.

I mimicked that behaviour with this change. When the compilation fails, the Entry is populated with that information s done before. If the above error triggers, the compilation results have already been stored. Since this is an asynchronous compilation, the fall back path has already been chosen, which would match the original behavior

tensorflow/compiler/jit/xla_compilation_cache.h

sanjoy · 2020-09-24T04:58:26Z

tensorflow/compiler/jit/xla_compilation_cache.h

  // The number of times a lazy compilation must be requested for a specific
  // signature before  we attempt to compile it.
-  static constexpr int64 kDefaultCompilationThreshold = 2;
+  static constexpr int64 kDefaultCompilationThreshold = 3;


This is a separate change right?

yes. Should I leave this out? This was part of the change that changes the compilation heuristic (see commit comment)

Yes, let's leave this out. IMO we should do this only if async compilation is enabled, assuming that it makes sense only when async compilation is enabled.

I dropped the change that changes the compilation protocol

tensorflow/compiler/jit/xla_platform_info.h

tensorflow/compiler/jit/xla_compilation_cache.h

gbaned · 2020-09-30T13:13:14Z

@bas-aarts Can you please check @sanjoy's comments and keep us posted ? Thanks!

sanjoy · 2020-10-21T15:44:37Z

tensorflow/compiler/jit/kernels/xla_ops.cc

@@ -166,8 +166,8 @@ static Status CompileToLocalExecutable(
    const XlaPlatformInfo& platform_info,
    absl::Span<const Tensor* const> inputs,
    absl::Span<VariableInfo const> variable_infos,
-    absl::Span<const int> constants, bool lazy, bool may_alias_resource_update,
-    xla::LocalClient** client,
+    absl::Span<const int> constants, bool async, bool lazy,


Instead of passing around two bools, I'd prefer passing around a XlaCompilationCache::CompileMode (i.e. convert the pair of bools into XlaCompilationCache::CompileMode much earlier in the process).

tensorflow/compiler/jit/kernels/xla_ops.cc

sanjoy · 2020-10-22T04:22:18Z

tensorflow/compiler/jit/xla_compilation_cache.cc

+  // compile_fn can be called asynchronously. Make sure all required arguments
+  // are passed by value.
+  auto compile_fn = [&, compile_options, function](
+                        XlaCompiler* compiler,
+                        const std::vector<XlaCompiler::Argument>& args,


I think this is a bit too subtle, can you please create a struct with an operator() that explicitly captures all state?

@sanjoy, I'm not seeing how you want me to use the operator().
please add some pseudo code to show the intent

I meant writing this as:

struct CompileFunctor { CompileOptions compile_options; // All the other state that is needed Status operator()(<args>) { ... } };

And create an instance of CompileFunctor and pass that to CompileImpl. The difference is that the state captured is now explicit. In the current version it will be easy for a later change to introduce the use of a local variable in the body of the lambda.

Passing additional argument to CompileImpl requires many additional functions to changes as well. Capture by value is cleaner.
Added a POD-type instead of a class with a operator() to explicitly capture the required variables, added additional comments for clarity, and most importantly, removed capture-default to prevent additional variables to be introduced in the body of the lambda

sanjoy · 2020-10-22T06:08:14Z

tensorflow/compiler/jit/xla_compilation_cache.cc

+  // compilation.
+  // Passing args by value as well. Doing this here only when an asynchronous
+  // compilation is performed, as copying many args incurs an overhead.
+  async_compilation_.compiler_threads.Schedule([=] {


Again, I think this is a bit too implicit. Can you instead create a struct that explicitly captures state and exposes an operator(), or some other way to capture the state implicitly?

Changed the comment here. Since everything is passed by value, no need for a struct to encapsulate anything (which would make it more error prone if new arguments would have to be passed by value)

Passing things by value does not guarantee stability though, since you could be passing a soon-to-be-stale pointer by value.

Creating a struct that has the state explicitly threaded through (like in CompileFunctor above) will make it easier to spot use-after-free bugs.

Added a POD-type instead of a class with a operator() to explicitly capture the required variables, added additional comments for clarity, and most importantly, removed capture-default to prevent additional variables to be introduced in the body of the lambda

sanjoy · 2020-10-22T06:24:14Z

tensorflow/compiler/jit/xla_compilation_cache.cc

+  string function_name = function.name();
+  string human_signature = VLOG_IS_ON(3) ? signature.HumanString() : function_name;
+  VLOG(2) << "Signature: " << human_signature;


Let's put all of this under VLOG_IS_ON(2) to avoid unnecessary copies.

sanjoy · 2020-10-22T06:26:05Z

tensorflow/compiler/jit/xla_compilation_cache.h

  // The number of times a lazy compilation must be requested for a specific
  // signature before  we attempt to compile it.
-  static constexpr int64 kDefaultCompilationThreshold = 2;
+  static constexpr int64 kDefaultCompilationThreshold = 3;


Yes, let's leave this out. IMO we should do this only if async compilation is enabled, assuming that it makes sense only when async compilation is enabled.

sanjoy · 2020-10-22T06:27:15Z

tensorflow/compiler/tests/jit_test.py

@@ -519,6 +521,7 @@ def simpleTest(self, arg0, arg1, global_jit_level):

 class LazyCompilationTest(test.TestCase):

+  @unittest.skip("test too dependant on XLA compilation protocol")


This is by design -- the test is testing the XLA compilation protocol. :)

I think the test needs to be adjusted to adapt to whatever the new scheme is,

I removed the change that changes the compilation protocol

tensorflow/compiler/tf2xla/xla_compiler.h

tensorflow/compiler/xla/service/backend.h

gbaned · 2020-10-23T13:51:45Z

@bas-aarts Can you please check @sanjoy's comments and keep us posted ? Thanks!

gbaned · 2020-11-05T13:49:07Z

@bas-aarts Any update on this PR? Please. Thanks!

bas-aarts · 2020-11-23T22:07:39Z

This PR is still being worked on

bas-aarts

Rebased and addressed many comments

tensorflow/compiler/jit/xla_compilation_cache.cc

sanjoy · 2020-12-30T05:46:33Z

tensorflow/compiler/jit/kernels/xla_ops.cc

@@ -166,8 +166,8 @@ static Status CompileToLocalExecutable(
    const XlaPlatformInfo& platform_info,
    absl::Span<const Tensor* const> inputs,
    absl::Span<VariableInfo const> variable_infos,
-    absl::Span<const int> constants, bool lazy, bool may_alias_resource_update,
-    xla::LocalClient** client,
+    absl::Span<const int> constants, XlaCompilationCache::CompileMode cmode,


Let's call this compile_mode.

sanjoy · 2020-12-30T05:48:12Z

tensorflow/compiler/jit/kernels/xla_ops.cc

+  // If must_compile_ is true, there is no fallback path and therefore
+  // async and lazy must be false. If must_compile_ is false, and async
+  // compilation is enabled, async is true, and lazy is false. Otherwise
+  // lazy compilation is true.
+  bool async = !must_compile_ &&
+               GetXlaOpsCommonFlags().tf_xla_async_compilation;
+  // Possible future work:
+  //    disable async for small clusters.
+  //    disable async for cluster that have short compile time.
+  bool lazy = async ? false : !must_compile_;
+  XlaCompilationCache::CompileMode cmode =
+    lazy  ? XlaCompilationCache::CompileMode::kLazy :
+    async ? XlaCompilationCache::CompileMode::kAsync :
+            XlaCompilationCache::CompileMode::kStrict;


IMO this will be easier to read and self documenting if we write it as:

XlaCompilationCache::CompileMode compile_mode = [&] { if (must_compile_) { return kStrict; } return GetXlaOpsCommonFlags().tf_xla_async_compilation ? kAsync : kLazy; }();

sanjoy · 2020-12-30T05:50:42Z

tensorflow/compiler/jit/xla_compilation_cache.cc

+  // compile_fn can be called asynchronously. Make sure all required arguments
+  // are passed by value.
+  auto compile_fn = [&, compile_options, function](
+                        XlaCompiler* compiler,
+                        const std::vector<XlaCompiler::Argument>& args,


I meant writing this as:

struct CompileFunctor { CompileOptions compile_options; // All the other state that is needed Status operator()(<args>) { ... } };

And create an instance of CompileFunctor and pass that to CompileImpl. The difference is that the state captured is now explicit. In the current version it will be easy for a later change to introduce the use of a local variable in the body of the lambda.

sanjoy · 2020-12-30T05:52:59Z

tensorflow/compiler/jit/xla_compilation_cache.cc

+  // compilation.
+  // Passing args by value as well. Doing this here only when an asynchronous
+  // compilation is performed, as copying many args incurs an overhead.
+  async_compilation_.compiler_threads.Schedule([=] {


Passing things by value does not guarantee stability though, since you could be passing a soon-to-be-stale pointer by value.

Creating a struct that has the state explicitly threaded through (like in CompileFunctor above) will make it easier to spot use-after-free bugs.

sanjoy · 2020-12-30T05:54:17Z

tensorflow/compiler/xla/service/backend.cc

@@ -142,7 +142,9 @@ Backend::Backend(se::Platform* platform, Compiler* compiler,
  }
 }

-Backend::~Backend() {}
+Backend::~Backend() {
+    CHECK_EQ(memory_allocator_.use_count(), 1);


Why is this true? What prevents a compilation from running concurrently with backend destruction? (Please add a comment.)

check is not needed. Memory allocator is stored in shared pointer, so even if Backend is destroyed, compilation will succeed.

bas-aarts · 2021-03-09T22:27:07Z

Just noticed this change has not yet been merged. Any reason for this?

sanjoy · 2021-03-10T07:34:26Z

Just noticed this change has not yet been merged. Any reason for this?

Not clear, @gbaned do you know what's going on here?

gbaned · 2021-03-10T15:02:46Z

Just noticed this change has not yet been merged. Any reason for this?

Not clear, @gbaned do you know what's going on here?

@sanjoy Internal test failures are appearing in CL. Can you please take a look. Thank you!

google-ml-butler bot added the size:L CL Change Size: Large label Sep 8, 2020

googlebot added the cla: yes label Sep 8, 2020

gbaned self-assigned this Sep 8, 2020

gbaned added this to Assigned Reviewer in PR Queue via automation Sep 8, 2020

gbaned requested a review from cheshire September 8, 2020 05:13

bas-aarts changed the title ~~Changes in lazy XLA compilation~~ XLA Asynchronous compilation Sep 10, 2020

gbaned added the awaiting review Pull request awaiting review label Sep 11, 2020

gbaned requested a review from sanjoy September 23, 2020 11:47

sanjoy suggested changes Sep 24, 2020

View reviewed changes

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Sep 24, 2020

tensorflowbutler removed the awaiting review Pull request awaiting review label Sep 26, 2020

gbaned added the stat:awaiting response Status - Awaiting response from author label Oct 6, 2020

gbaned requested a review from sanjoy October 9, 2020 14:18

gbaned added awaiting review Pull request awaiting review and removed stat:awaiting response Status - Awaiting response from author labels Oct 9, 2020

sanjoy suggested changes Oct 22, 2020

View reviewed changes

gbaned removed the awaiting review Pull request awaiting review label Oct 23, 2020

gbaned added the stat:awaiting response Status - Awaiting response from author label Oct 23, 2020

tensorflowbutler added the stale This label marks the issue/pr stale - to be closed automatically if no activity label Nov 22, 2020

google-ml-butler bot removed the stale This label marks the issue/pr stale - to be closed automatically if no activity label Nov 23, 2020

bas-aarts force-pushed the bas-devel-async-compilation branch from a312c37 to 5111699 Compare November 24, 2020 19:18

bas-aarts commented Nov 24, 2020

View reviewed changes

tensorflow/compiler/jit/xla_compilation_cache.cc Outdated Show resolved Hide resolved

gbaned requested a review from sanjoy November 25, 2020 09:06

gbaned removed the stat:awaiting response Status - Awaiting response from author label Nov 25, 2020

gbaned added the awaiting review Pull request awaiting review label Dec 4, 2020

sanjoy suggested changes Dec 30, 2020

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 26, 2021

PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Feb 26, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 26, 2021

gbaned added ready to pull PR ready for merge process and removed awaiting review Pull request awaiting review ready to pull PR ready for merge process labels Feb 26, 2021

google-ml-butler bot removed the ready to pull PR ready for merge process label Feb 26, 2021

fix another CI failure

a533e71

bas-aarts requested a review from sanjoy February 26, 2021 22:56

sanjoy approved these changes Feb 26, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 26, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 26, 2021

google-ml-butler bot removed the ready to pull PR ready for merge process label Feb 27, 2021

keep pylint happy

597499a

sanjoy approved these changes Feb 27, 2021

View reviewed changes

google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Feb 27, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 27, 2021

Matausi29 approved these changes Feb 27, 2021

View reviewed changes

google-ml-butler bot added the kokoro:force-run Tests on submitted change label Feb 27, 2021

kokoro-team removed the kokoro:force-run Tests on submitted change label Feb 27, 2021

sanjoy removed the request for review from cheshire March 10, 2021 07:33

copybara-service bot merged commit 7005f42 into tensorflow:master Mar 16, 2021

PR Queue automation moved this from Approved by Reviewer to Merged Mar 16, 2021

bas-aarts deleted the bas-devel-async-compilation branch March 20, 2021 01:03

		@@ -519,6 +521,7 @@ def simpleTest(self, arg0, arg1, global_jit_level):

		class LazyCompilationTest(test.TestCase):

		@unittest.skip("test too dependant on XLA compilation protocol")

XLA Asynchronous compilation #43034

XLA Asynchronous compilation #43034

Conversation

bas-aarts commented Sep 8, 2020 • edited

Choose a reason for hiding this comment

bas-aarts Oct 8, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bas-aarts Oct 8, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaned commented Sep 30, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bas-aarts Jan 12, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gbaned commented Oct 23, 2020

gbaned commented Nov 5, 2020

bas-aarts commented Nov 23, 2020

bas-aarts left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bas-aarts Jan 12, 2021 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bas-aarts commented Mar 9, 2021

sanjoy commented Mar 10, 2021

gbaned commented Mar 10, 2021

bas-aarts commented Sep 8, 2020 •

edited

bas-aarts Oct 8, 2020 •

edited

bas-aarts Oct 8, 2020 •

edited

bas-aarts Jan 12, 2021 •

edited

bas-aarts Jan 12, 2021 •

edited