Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mention TF_FORCE_GPU_ALLOW_GROWTH in using_gpu.md #558

Merged
merged 4 commits into from May 8, 2019

Conversation

Projects
None yet
5 participants
@lmartak
Copy link
Contributor

commented May 7, 2019

As mentioned in this comment to the commit of discussed functionality, I'd like to have it mentioned here, as it is the most relevant place for it to be mentioned that I could find.

Since I failed to find info about whether this feature is intended to be kept around and integrated into TF2 in the future development, or its future is not that certain, I'd like to at least start the discussion through this PR and maybe advocate for it, as I recently found it most useful and would really appreciate for it to be standardized/propagated into future TF2 stable releases.

Thanks for any reply!

Mention TF_FORCE_GPU_ALLOW_GROWTH in using_gpu.md
As described in [this comment](tensorflow/tensorflow@5d20470#commitcomment-33445269) to the commit of discussed functionality, I'd like to have it mentioned here, as it is the most relevant place for it to be mentioned that I could find.

As I failed to find info about whether this feature is intended to be kept around and integrated into TF2 in the future development or its future is not that bright, I'd like to at least start a discussion about it by this PR and maybe advocate for it, as I find it most useful and would really appreciate for it to be standardized/propagated into future TF2 stable releases.

@lmartak lmartak requested review from lamberta and MarkDaoust as code owners May 7, 2019

@googlebot

This comment has been minimized.

Copy link

commented May 7, 2019

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here (e.g. I signed it!) and we'll verify it.


What to do if you already signed the CLA

Individual signers
Corporate signers

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added the cla: no label May 7, 2019

@lmartak

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

I signed it!

@googlebot

This comment has been minimized.

Copy link

commented May 7, 2019

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

@googlebot googlebot added cla: yes and removed cla: no labels May 7, 2019

@lamberta

This comment has been minimized.

Copy link
Member

commented May 7, 2019

Thanks, Lukas! Looks handy but I'm wondering if the platform-specific nature of this is why we're not surfacing it in the docs. Any insights into this @tfboyd?

As for inquiring about its status in TF 2, that would be a better question for either the testing@tensorflow.org or developers@tensorflow.org groups.

@lamberta lamberta added the question label May 7, 2019

@tfboyd

This comment has been minimized.

Copy link
Member

commented May 7, 2019

I may totally miss the point, so please correct me. I do not think we want to share another way of turning on the GPU growth feature. the config proto is what we want people to use not the ENV VAR. If there is some advantage to the ENV_VAR other than someone liking to use them, please let me know and we can sort this out.

@lmartak

This comment has been minimized.

Copy link
Contributor Author

commented May 7, 2019

@lamberta Thanks for the links, I'll ask about the future plans there.

@tfboyd Thanks for the comment. I understand your concerns, let me clarify this. + Apologies for the size and many thanks for your time if you're going to read this!

I am not sure about the design decisions that led to implementation of this feature, so I want to clarify how much one can rely on it in the future. But since it is there, useful to at least system admins (e.g. people configuring runtimes that are served through Colab or any other multi-user or multi-instance system), it might as well be documented somewhere, as other people administrating multi-user systems might want to make use of it in the future.

I do not think we want to share another way of turning on the GPU growth feature.

I'd disagree that this is just another way of doing the same thing. It is rather the only way to persistently set a default availability of the feature. This is a different thing to do, conceptually, and is especially useful when high-level libraries such as tf.keras or tf.estimator start meddling with the existing user-configured session under the hood in undocumented ways, screwing the config up along the way (own experience, separate issue material).

some advantage to the ENV_VAR

In a single GPU multi-user system, one can use this feature to set a system-wide default instead of forcing every user to include a code snippet that changes the TF-enforced default (disabled) to the one that is actually preferred (enabled) and ensure it is present all of their projects on every spot where default / active session is being manipulated.

I can imagine that this is not of much relevance in industry, where there is plenty of hardware for every practitioner, but in academia, the story is quite opposite and scarcity of hardware sometimes implies multiple practitioners working on a single piece of CUDA-capable hardware in parallel. To give some examples from experience:

  • JupyterHub for class/lab: multiple students running their python jobs, each using only a fraction of GPU space-time, so the shared GPU resource gets utilized by those jobs much better, in parallel
  • Multiple users prototyping their models while measuring it's "GPU memory hunger" (as determined by TF) or running long training jobs in parallel.

I understand that this is not an intended use of allow_growth, but most novice practitioners don't get anywhere near 100% GPU Utilization even when employing tf.data input pipeline optimizations. Therefore, aforementioned use-cases stay relevant. Also, as PyTorch works in allow_growth=True by default (not sure if it has a False option though), this enables to PyTorch and TensorFlow jobs to utilize GPU resources in parallel.

Being able to impose own preference over this default on the scope of whole maintained system is a great feature to have (even though quite use-case specific) and I just wished for it to be advertised a bit better than just this.

I hope this clarifies my viewpoint enough.

Thanks for your time and for any reply!

@lmartak
Copy link
Contributor Author

left a comment

I'm quite satisfied with this reword.

@tfboyd

This comment has been minimized.

Copy link
Member

commented May 7, 2019

More than enough to justify. You had me at notebook and needing to set this outside the user code. I would like the updated wording to say something like this is another way to set the value. I hope to come back to this later today and hopefully a fast back and forth then submit before the week is over at the latest. After all your typing and reading I don't want this lost.

@lamberta

This comment has been minimized.

Copy link
Member

commented May 7, 2019

Thanks @tfboyd and @lmartak
Reworded text a bit, but we're back at Lukas's question about TF2 ... can the TF2 gpu guide also be updated with this? r2/guide/using_gpu.md

@tfboyd

This comment has been minimized.

Copy link
Member

commented May 7, 2019

This feature is supposed to work with TF 2.0, so Yes. If it does not work, that is a bug.

@lamberta lamberta added ready to pull and removed question labels May 7, 2019

@lamberta

This comment has been minimized.

Copy link
Member

commented May 7, 2019

Thanks. Also added to r2 guide since it should work :)

@lmartak

This comment has been minimized.

Copy link
Contributor Author

commented May 8, 2019

Beautiful. Both in single PR. Thanks for the closure on intended future support! Also it's nice to see the effort getting some utility as you guys work so fast! :)

Long live TensorFlow!

@TensorFlow-Docs-Copybara TensorFlow-Docs-Copybara merged commit cebdf59 into tensorflow:master May 8, 2019

2 checks passed

cla/google All necessary CLAs are signed
import/copybara Change imported to the internal review system
Details

TensorFlow-Docs-Copybara pushed a commit that referenced this pull request May 8, 2019

Copybara-Service
Merge pull request #558 from lmartak:patch-1
PiperOrigin-RevId: 247225427
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.