Reduce GPU Requirements for Getting Started Guide #253

danehans · 2025-01-29T18:06:05Z

Currently, the vLLM deployment requires 3 replicas. We should consider using 1 replica to reduce GPU requirements. With 1 replica, the guide can still demonstrate LoRA-based load balancing.

The text was updated successfully, but these errors were encountered:

danehans · 2025-01-29T18:07:19Z

Thoughts @ahg-g @kfswain?

ahg-g · 2025-01-29T18:09:07Z

yes, and also use max-lora 2 instead of 4 so that we allow using smaller gpus

kfswain · 2025-01-29T18:58:06Z

Maybe we make some callouts.

Reducing replica count to 1 may defeat the purpose of demoing a routing tool should someone choose to experiment/benchmark.

But agreed that accelerators are expensive and highly sought after. So will make some callouts to adjust the vLLM deployment based on what your have

anshuman-agarwala · 2025-02-03T04:25:36Z

/assign

ahg-g · 2025-02-03T09:14:57Z

Good point @kfswain ; other option is to use a kustomize template.

danehans added documentation good first issue help wanted labels Jan 30, 2025

k8s-ci-robot assigned anshuman-agarwala Feb 3, 2025

anshuman-agarwala mentioned this issue Feb 3, 2025

Reduced GPU requirements #272

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Reduce GPU Requirements for Getting Started Guide #253

Reduce GPU Requirements for Getting Started Guide #253

danehans commented Jan 29, 2025 •

edited

Loading

danehans commented Jan 29, 2025

Uh oh!

ahg-g commented Jan 29, 2025

Uh oh!

kfswain commented Jan 29, 2025

Uh oh!

anshuman-agarwala commented Feb 3, 2025

Uh oh!

ahg-g commented Feb 3, 2025 •

edited

Loading

Uh oh!

Reduce GPU Requirements for Getting Started Guide #253

Reduce GPU Requirements for Getting Started Guide #253

Comments

danehans commented Jan 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

danehans commented Jan 29, 2025

Uh oh!

ahg-g commented Jan 29, 2025

Uh oh!

kfswain commented Jan 29, 2025

Uh oh!

anshuman-agarwala commented Feb 3, 2025

Uh oh!

ahg-g commented Feb 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

danehans commented Jan 29, 2025 •

edited

Loading

ahg-g commented Feb 3, 2025 •

edited

Loading