Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example for kubeflow spot instance cluster config #2640

Conversation

arjun921
Copy link
Contributor

@arjun921 arjun921 commented Sep 11, 2020

Helps users build a highly cost efficient Kubeflow cluster for ML/DL training at 50-80% less cost

Description

Added cost efficient Kubeflow cluster spec with

  • Spot GPU instances
  • Scale down to zero

Checklist

  • Added tests that cover your change (if possible)
  • Added/modified documentation as required (such as the README.md, or the userdocs directory)
  • Manually tested
  • Added labels for change area (e.g. area/nodegroup), target version (e.g. version/0.12.0) and kind (e.g. kind/improvement)
  • Make sure the title of the PR is a good description that can go into the release notes

Helps users build a highly cost efficient Kubeflow cluster for ML/DL training at 50-80% less cost
Copy link
Contributor

@martina-if martina-if left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @arjun921 , very nice and complete example! Thank you!
I only have one nitpick and it's that it would be nice to split long lines (line number 17, 93 etc) into shorter ones.

Other than that I think this can be rebased, approved and merged :)

@arjun921
Copy link
Contributor Author

arjun921 commented Sep 15, 2020

Hi @martina-if ,

That's a really good catch!

I'll make those changes and push soon.

I have one doubt though, would it be helpful if I included a detailed blog post as a comment in the file?

Here's the blog:
https://blog.gofynd.com/how-we-reduced-our-ml-training-costs-by-78-a33805cb00cf

I was thinking if it helps, I could include it like line 3 on the source:
https://github.com/arjun921/aws-spot-instances-kubeflow/blob/master/envs/staging/cluster-spec.yml

@martina-if
Copy link
Contributor

Yeah, I think the link could be useful for users 👍

@arjun921
Copy link
Contributor Author

@martina-if All requested changes done!
Please let me know if there's anything that I might have missed out on.

@martina-if martina-if changed the title Added kubeflow spot instance cluster spec Add example for kubeflow spot instance cluster config Sep 15, 2020
@martina-if martina-if added kind/docs User documentation skip-release-notes Causes PR not to show in release notes labels Sep 15, 2020
@arjun921 arjun921 force-pushed the add-kubeflow-spot-instance-cluster-spec branch from c2428ff to e304f66 Compare September 15, 2020 17:15
NodeGroup.nodeGroups.desiredCapacity of type int
@arjun921
Copy link
Contributor Author

@martina-if I've updated test cases,
it's hopefully ready for a merge 😬

@michaelbeaumont michaelbeaumont merged commit f8b45cf into eksctl-io:master Sep 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/docs User documentation skip-release-notes Causes PR not to show in release notes
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants