Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ROCm] AMDGPU compiler fixes #41641

Merged

Conversation

ekuznetsov139
Copy link
Contributor

This PR:
Ensures that AMDGPU compiler's temporary files are all different from instance to instance (important for multi-GPU training, e.g. with Horovod) and that they are deleted after compilation
Adds a HSACO cache, to speed up the compilation process when compilation of multiple identical IR's is requested

Deleting temporary files after compilation
@google-ml-butler google-ml-butler bot added the size:M CL Change Size: Medium label Jul 22, 2020
@gbaned gbaned self-assigned this Jul 23, 2020
@gbaned gbaned added the comp:gpu GPU related issues label Jul 23, 2020
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Jul 23, 2020
@gbaned gbaned requested a review from chsigg July 29, 2020 17:48
@gbaned gbaned added the awaiting review Pull request awaiting review label Jul 29, 2020
g_hsacoCache.request_count++;
if (hit) g_hsacoCache.hit_count++;
if (!(g_hsacoCache.request_count % 50))
VLOG(0) << "HSACO cache: " << g_hsacoCache.request_count << " requests, "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you bump this to VLOG(1)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes

@@ -584,18 +640,29 @@ StatusOr<std::vector<uint8>> EmitModuleToHsaco(
std::string tempdir_name = tempdir_vector.front();
VLOG(1) << "Compile-time artifacts located at: " << tempdir_name;

bool keep_tempfiles = false;
TF_CHECK_OK(tensorflow::ReadBoolFromEnvVar("TF_ROCM_XLA_TEMPFILES",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would TF_ROCM_KEEP_XLA_TEMPFILES be clearer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure

@@ -584,18 +640,29 @@ StatusOr<std::vector<uint8>> EmitModuleToHsaco(
std::string tempdir_name = tempdir_vector.front();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it be possible (and better?) to pick a new sub-directory each time, instead of different file names?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what's the benefit in that. More library calls, and all the files are going to be deleted anyway. Also, there's a theoretical possibility that we won't be able to delete the subdirectory if it is still open in some subprocess when we get to the end of the function.
In any event, that is an additional feature and shouldn't go into this PR, I think.

std::string ir_str;
llvm::raw_string_ostream stream(ir_str);
stream << *module;
std::string str = stream.str();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can use ir_str, no need to copy it into str.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

std::string str = stream.str();
// Delete the first two lines, since they usually vary even when the rest of
// the code is the same (but verify that they are what we expect).
if (str.size() >= 13 && str.substr(0, 13) == "; ModuleID = ") {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to use llvm string processing helpers here, or even llvm::Regex?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code does the job as written. Why bring in nonstandard APIs and add more complexity?

if (dump_lls) {
static int hsaco_count = 0;
char name[256];
sprintf(name, "/tmp/%d.ll", hsaco_count);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should use C libraries here. Can you please modernize this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@googlebot
Copy link

All (the pull request submitter and all commit authors) CLAs are signed, but one or more commits were authored or co-authored by someone other than the pull request submitter.

We need to confirm that all authors are ok with their commits being contributed to this project. Please have them confirm that by leaving a comment that contains only @googlebot I consent. in this pull request.

Note to project maintainer: There may be cases where the author cannot leave a comment, or the comment is not properly detected as consent. In those cases, you can manually confirm consent of the commit author(s), and set the cla label to yes (if enabled on your project).

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: no and removed cla: yes labels Aug 7, 2020
@googlebot
Copy link

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no labels Aug 7, 2020
@tensorflowbutler tensorflowbutler removed the awaiting review Pull request awaiting review label Aug 9, 2020
@gbaned gbaned requested a review from chsigg August 13, 2020 17:16
@gbaned gbaned added the awaiting review Pull request awaiting review label Aug 13, 2020
PR Queue automation moved this from Assigned Reviewer to Approved by Reviewer Aug 19, 2020
@google-ml-butler google-ml-butler bot added the kokoro:force-run Tests on submitted change label Aug 19, 2020
@google-ml-butler google-ml-butler bot added the ready to pull PR ready for merge process label Aug 19, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 19, 2020
@rthadur rthadur removed awaiting review Pull request awaiting review ready to pull PR ready for merge process labels Aug 19, 2020
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Aug 19, 2020
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Aug 19, 2020
@gbaned gbaned added ready to pull PR ready for merge process and removed ready to pull PR ready for merge process labels Aug 20, 2020
@tensorflow-copybara tensorflow-copybara merged commit 328cc0e into tensorflow:master Aug 21, 2020
PR Queue automation moved this from Approved by Reviewer to Merged Aug 21, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes comp:gpu GPU related issues ready to pull PR ready for merge process size:M CL Change Size: Medium
Projects
PR Queue
  
Merged
Development

Successfully merging this pull request may close these issues.

None yet

8 participants