Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change Redzone space limit for XLA GPU #54860

Closed
wants to merge 3 commits into from

Conversation

kaixih
Copy link
Contributor

@kaixih kaixih commented Mar 2, 2022

This PR changes how the redzone space limit is set in the XLA gpu conv algorithm picker.

  1. It sets the numeric max of int64 for the input/output allocator. So, we can have consistent behavior with the gemm picker.
  2. It allows the adjustment of the space limit for the scratch allocator via an env var. So, users can adjust it via XLA_FLAGS=--xla_gpu_redzone_scratch_max_megabytes=6144.

cc. @nluehr

@google-ml-butler google-ml-butler bot added the size:S CL Change Size: Small label Mar 2, 2022
@gbaned gbaned requested a review from chsigg March 3, 2022 12:26
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Mar 3, 2022
@gbaned gbaned added this to Assigned Reviewer in PR Queue via automation Mar 3, 2022
@chsigg chsigg requested review from cheshire and removed request for chsigg March 3, 2022 12:27
@cheshire
Copy link
Member

cheshire commented Mar 3, 2022

Hi @kaixih , could you provide some more context on what is the desired goal?

@gbaned gbaned removed the awaiting review Pull request awaiting review label Mar 4, 2022
@kaixih
Copy link
Contributor Author

kaixih commented Mar 7, 2022

@cheshire Sure. Basically, we found that the max space limit of redzone allocator for the XLA conv is set to be 4GB, which is insufficient for some models that expect large input/output tensors. In addition, we also noticed that this limit is not adjustable during runtime. So, compared to the XLA gemm, which has already set the limit of the input/output redzone allocator to the numeric max of int, we think it might be appropriate to follow it for the XLA conv. Moreover, we introduced a new env var to control the scratch redzone allocator max limit as well in case it needs to be adjusted.

PR Queue automation moved this from Assigned Reviewer to Reviewer Requested Changes Mar 7, 2022
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 7, 2022
PR Queue automation moved this from Reviewer Requested Changes to Approved by Reviewer Mar 7, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 7, 2022
@kaixih kaixih added the kokoro:force-run Tests on submitted change label Mar 7, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 7, 2022
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Mar 7, 2022
@gbaned gbaned requested a review from cheshire March 8, 2022 13:51
@google-ml-butler google-ml-butler bot added the awaiting review Pull request awaiting review label Mar 8, 2022
@google-ml-butler google-ml-butler bot added kokoro:force-run Tests on submitted change ready to pull PR ready for merge process labels Mar 8, 2022
@kokoro-team kokoro-team removed the kokoro:force-run Tests on submitted change label Mar 8, 2022
@gbaned gbaned added ready to pull PR ready for merge process and removed awaiting review Pull request awaiting review ready to pull PR ready for merge process labels Mar 9, 2022
@kaixih
Copy link
Contributor Author

kaixih commented Mar 10, 2022

@cheshire Can you help check what blocks the merging? It seems some "Google internal checks" failed. Thanks.

copybara-service bot pushed a commit that referenced this pull request Mar 14, 2022
@akuegel
Copy link
Member

akuegel commented Mar 14, 2022

There was a merge conflict in xla.proto, you used the same tag as was already used in a recent change. I fixed that and got your PR merged.

@gbaned
Copy link
Contributor

gbaned commented Mar 14, 2022

Seems auto-merge is not happening but the changes are merged into master now, so we can close this. Thank you for the PR.

@gbaned gbaned closed this Mar 14, 2022
PR Queue automation moved this from Approved by Reviewer to Closed/Rejected Mar 14, 2022
@google-ml-butler google-ml-butler bot removed the ready to pull PR ready for merge process label Mar 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:S CL Change Size: Small
Projects
PR Queue
  
Closed/Rejected
Development

Successfully merging this pull request may close these issues.

None yet

5 participants