Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Separate dev baseline and cuda layers #47

Closed
wants to merge 6 commits into from

Conversation

bhack
Copy link
Contributor

@bhack bhack commented Nov 25, 2021

This pull request is going to separate baseline and GPU layers to conditionally build GPU and GPU images with a subsequent PR.

See more at:
https://discuss.tensorflow.org/t/adopting-open-source-dockerfiles-for-official-tf-nightly-ci/6050/4
tensorflow/addons#2613

@bhack
Copy link
Contributor Author

bhack commented Nov 26, 2021

/cc @fsx950223 let me know if you have any feedback

@bhack bhack mentioned this pull request Nov 26, 2021
@angerson
Copy link
Contributor

What, specifically, could be accomplished with this change that could not be done with the current set of images?

@bhack
Copy link
Contributor Author

bhack commented Nov 29, 2021

It was discussed already in the forum thread and in the mentioned ticket.
With we have an intermediate CPU/baseline only image/target without the mandatory 4.24 GB single layer that includes CUDA.
In many use cases we don't want to handle the overhead to invalidate a 4.24 GB layer (re-download, extract and disk space or cloud occupancy).

The CPU/baseline could be also useful later to create other kind of images that are not CUDA centric.. eg. ROCM etc..

Preparing many PR doesn't requires the CUDA overhead and could be prepared on CPU only machine with a baseline image. Also If you develop your PR on the cloud you could just save disk space/costs.

We need to support developers diversity and this is not going to create a tool large overhead as it is simply an intermediate stage/target over the original design.

@angerson
Copy link
Contributor

Ok, let's put this idea on hold until the size prevents us from doing something we otherwise would do. For example, if we're unable to run tests or provide some other utility because the size of the container is too large. For this quarter, I am restricting all development on these images to what my team uses in CI. Our CI doesn't need small CPU images, so any work towards those is going to create dead / unsupported extra containers, which I'm not willing to do until we have a lot more information about usage, needs, etc.

@bhack
Copy link
Contributor Author

bhack commented Nov 29, 2021

If 6 years of TF activities are not enough for you to support this it is ok.
But I think that it will be hard to find longest time contributors when you are going to evaluate community maintanership:
immagine

@bhack
Copy link
Contributor Author

bhack commented Mar 1, 2022

This was gone a little bit on drift over the time cause we had other commit on master.
In general I don't know if @deven-amd could be interested in this separation.

@bhack
Copy link
Contributor Author

bhack commented Mar 21, 2022

@deven-amd Are you interested to have a CPU only stage? Could the ROCm image rely on this?

@bhack
Copy link
Contributor Author

bhack commented Apr 7, 2022

As nobody is interested to reuse a common and shared CPU stage/image baseline I'am going to close this.

@bhack bhack closed this Apr 7, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants