Reduce GPU Requirements for Getting Started Guide #253
Labels
documentation
Improvements or additions to documentation
good first issue
Denotes an issue ready for a new contributor, according to the "help wanted" guidelines.
help wanted
Denotes an issue that needs help from a contributor. Must meet "help wanted" guidelines.
Uh oh!
There was an error while loading. Please reload this page.
Currently, the vLLM deployment requires
3
replicas. We should consider using1
replica to reduce GPU requirements. With1
replica, the guide can still demonstrate LoRA-based load balancing.The text was updated successfully, but these errors were encountered: