New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

[ROCm] AMDGPU compiler fixes #41641

Merged

tensorflow-copybara merged 2 commits into tensorflow:master from ROCm:google-upstream-hsaco-cache

Aug 21, 2020

Contributor

ekuznetsov139 commented Jul 22, 2020

This PR:
Ensures that AMDGPU compiler's temporary files are all different from instance to instance (important for multi-GPU training, e.g. with Horovod) and that they are deleted after compilation
Adds a HSACO cache, to speed up the compilation process when compilation of multiple identical IR's is requested


          HSACO cache

6b249a8

Deleting temporary files after compilation

google-ml-butler bot added the size:M label

google-ml-butler bot requested a review from joker-eph

July 22, 2020 23:21

googlebot added the cla: yes label

gbaned self-assigned this

gbaned added the comp:gpu label

gbaned added this to Assigned Reviewer in PR Queue via automation

ekuznetsov139 mentioned this pull request

[ROCm] AMDGPU XLA compiler bugfixes, HLO slice sorting #39106

Closed

gbaned requested a review from chsigg

July 29, 2020 17:48

gbaned added the awaiting review label

chsigg reviewed

View reviewed changes

tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc Outdated

+                g_hsacoCache.request_count++;
+                if (hit) g_hsacoCache.hit_count++;
+                if (!(g_hsacoCache.request_count % 50))
+                  VLOG(0) << "HSACO cache: " << g_hsacoCache.request_count << " requests, "

Contributor

chsigg Aug 6, 2020

Can you bump this to VLOG(1)?

Contributor Author

ekuznetsov139 Aug 6, 2020

Yes

chsigg reviewed

View reviewed changes

tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc Outdated

@@ @@ -584,18 +640,29 @@ StatusOr<std::vector<uint8>> EmitModuleToHsaco( @@
                 std::string tempdir_name = tempdir_vector.front();
                 VLOG(1) << "Compile-time artifacts located at: " << tempdir_name;
+                bool keep_tempfiles = false;
+                TF_CHECK_OK(tensorflow::ReadBoolFromEnvVar("TF_ROCM_XLA_TEMPFILES",

Contributor

chsigg Aug 6, 2020

Would TF_ROCM_KEEP_XLA_TEMPFILES be clearer?

Contributor Author

ekuznetsov139 Aug 6, 2020

Sure

chsigg reviewed

View reviewed changes

tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc

		@@ -584,18 +640,29 @@ StatusOr<std::vector<uint8>> EmitModuleToHsaco(
		std::string tempdir_name = tempdir_vector.front();

Contributor

chsigg Aug 6, 2020

Would it be possible (and better?) to pick a new sub-directory each time, instead of different file names?

Contributor Author

ekuznetsov139 Aug 6, 2020

Not sure what's the benefit in that. More library calls, and all the files are going to be deleted anyway. Also, there's a theoretical possibility that we won't be able to delete the subdirectory if it is still open in some subprocess when we get to the end of the function.
In any event, that is an additional feature and shouldn't go into this PR, I think.

chsigg reviewed

View reviewed changes

tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc Outdated

+                std::string ir_str;
+                llvm::raw_string_ostream stream(ir_str);
+                stream << *module;
+                std::string str = stream.str();

Contributor

chsigg Aug 6, 2020

You can use ir_str, no need to copy it into str.

Contributor Author

ekuznetsov139 Aug 6, 2020

OK

chsigg reviewed

View reviewed changes

tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc

+                std::string str = stream.str();
+                // Delete the first two lines, since they usually vary even when the rest of
+                // the code is the same (but verify that they are what we expect).
+                if (str.size() >= 13 && str.substr(0, 13) == "; ModuleID = ") {

Contributor

chsigg Aug 6, 2020

Would it make sense to use llvm string processing helpers here, or even llvm::Regex?

Contributor Author

ekuznetsov139 Aug 6, 2020

The code does the job as written. Why bring in nonstandard APIs and add more complexity?

chsigg reviewed

View reviewed changes

tensorflow/compiler/xla/service/gpu/llvm_gpu_backend/gpu_backend_lib.cc Outdated

+                  if (dump_lls) {
+                    static int hsaco_count = 0;
+                    char name[256];
+                    sprintf(name, "/tmp/%d.ll", hsaco_count);

Contributor

chsigg Aug 6, 2020

I don't think we should use C libraries here. Can you please modernize this block?

Contributor Author

ekuznetsov139 Aug 6, 2020

OK

googlebot commented Aug 7, 2020

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

googlebot added cla: no and removed cla: yes labels


          Reviewer requested changes

a190fee

ekuznetsov139 force-pushed the google-upstream-hsaco-cache branch from 499cdbd to a190fee Compare

August 7, 2020 01:00

googlebot commented Aug 7, 2020

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

googlebot added cla: yes and removed cla: no labels

tensorflowbutler removed the awaiting review label

gbaned requested a review from chsigg

August 13, 2020 17:16

gbaned added the awaiting review label

PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer

chsigg approved these changes

View reviewed changes

google-ml-butler bot added the kokoro:force-run label

google-ml-butler bot added the ready to pull label

kokoro-team removed the kokoro:force-run label

rthadur removed awaiting review ready to pull labels

rthadur approved these changes

View reviewed changes

google-ml-butler bot added kokoro:force-run ready to pull labels

kokoro-team removed the kokoro:force-run label

gbaned added ready to pull and removed ready to pull labels

tensorflow-copybara merged commit 328cc0e into tensorflow:master

PR Queue automation moved this from Approved by Reviewer to Merged

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment