Skip to content

Reduce GPU Requirements for Getting Started Guide #253

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
danehans opened this issue Jan 29, 2025 · 5 comments
Open

Reduce GPU Requirements for Getting Started Guide #253

danehans opened this issue Jan 29, 2025 · 5 comments
Assignees
Labels
documentation Improvements or additions to documentation good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.

Comments

@danehans
Copy link
Contributor

danehans commented Jan 29, 2025

Currently, the vLLM deployment requires 3 replicas. We should consider using 1 replica to reduce GPU requirements. With 1 replica, the guide can still demonstrate LoRA-based load balancing.

@danehans
Copy link
Contributor Author

Thoughts @ahg-g @kfswain?

@ahg-g
Copy link
Contributor

ahg-g commented Jan 29, 2025

yes, and also use max-lora 2 instead of 4 so that we allow using smaller gpus

@kfswain
Copy link
Collaborator

kfswain commented Jan 29, 2025

Maybe we make some callouts.

Reducing replica count to 1 may defeat the purpose of demoing a routing tool should someone choose to experiment/benchmark.

But agreed that accelerators are expensive and highly sought after. So will make some callouts to adjust the vLLM deployment based on what your have

@danehans danehans added documentation Improvements or additions to documentation good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines. labels Jan 30, 2025
@anshuman-agarwala
Copy link

/assign

@ahg-g
Copy link
Contributor

ahg-g commented Feb 3, 2025

Good point @kfswain ; other option is to use a kustomize template.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation good first issue Denotes an issue ready for a new contributor, according to the "help wanted" guidelines. help wanted Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants